1 Introduction

With the development of the Internet technology, the traditional low dynamic range (LDR) video cannot meet people’s visual enjoyment. Compared with the traditional LDR video, the high dynamic range (HDR) video provides a wider range of luminance, which can describe a real scene accurately [1, 2]. However, the HDR video cannot be directly displayed by using the current LDR monitor, because of the wide range of luminance [3]. The HDR video is usually converted to the LDR video through tone mapping (TM) without losing too much detail information for the LDR monitor [4,5,6], and therefore how to protect copyright of the HDR video after TM has become an urgent problem to be solved. The watermarking technology provides a solution to the copyright protection [7,8,9].

Watermarking technologies can be divided into the spatial domain and transform domain-based methods. The spatial domain-based watermarking method directly modifies pixels values to embed watermark, such as least significant bit (LSB) substitution [10]. Since this type of watermarking method is sensitive to any modification, it is widely used in the image content authentication [11]. However, pixels of the HDR video are floating point values, and it is difficult to build rules for modifying pixels directly, because different HDR videos have different luminance ranges. Recently, some studies convert the floating point format of the HDR image to other encoding formats [12,13,14,15,16]. Cheng et al. [12] and Li et al. [13] embedded watermark into HDR image in the RGBE format and the LogLuv format, respectively, by using LSB, but they do not have robustness.

Compared with the spatial domain-based watermarking method, the transform domain-based watermarking method is more robust and can resist various image attacks [17,18,19]. Different decompositions were used to design robust watermarking methods, such as discrete cosine transform (DCT) [20], discrete wavelet transform (DWT) [21], singular value decomposition (SVD) [22], Schur decomposition [23], QR decomposition [24] and so on. Bhardwaj et al. utilized the mathematical relationship between the number of video frames and the embedding capacity to select the key frames for embedding watermark [25]. Esfahani et al. embedded watermark into the low entropy part of all three RGB color channels of each frame by combining QR decomposition, SVD, Chirp-Z transform, DWT and entropy [26]. Rasti et al. proposed a video watermarking method based on QR decomposition and entropy analysis, and watermark was embedded into the blocks with small entropy [27]. Above watermarking methods can resist a variety of image attacks, and the increment of the embedding strength will improve watermarking robustness. However, when the embedding strength is increased, it will lead to the visual distortion of the watermarked image. In order to obtain the trade-off between watermarking imperceptibility and robustness, watermarking methods based on human visual perception were studied [28,29,30].

Hu et al. used contrast sensitivity, luminance adaption and contrast masking to build a just noticeable difference (JND) model for DCT domain, and the JND was utilized to regulate the variation margin of each coefficient for embedding watermark. That method reduced the visual distortion of the watermarked image with high embedding strength [28]. Since visual entropy and edge entropy denote texture and edge information, respectively, Lai et al. utilized them to select the optimal embedding regions, which can decrease visual distortion [29]. Hernandez et al. used the JND estimation, the saliency mapping and the modulation stage to compute the spatiotemporal saliency-modulated JND profile, and it was used to adjust the embedding strength, which can obtain watermarking robustness with the good imperceptibility [30]. However, above visual perception models are designed for the LDR content, and they cannot perceive human visual system (HVS) characteristics for the wide range of luminance in the HDR content. Thus, a special visual perception model should be designed for the HDR content to embed watermark.

Guerrini et al. used luminance, activity and edge perceptions to compute a perceptual mask of the HDR image for DWT domain, and the perceptual mask was utilized to control the watermarking imperceptibility [31]. Since the human eye is sensitive to the high-luminance areas, Yu et al. embedded watermark into the low-luminance areas based on a luminance mask, which is designed based on the modified specular free [32]. Solachidis et al. computed the wavelet transform in the JND-scale space of the HDR image as the embedding domain, and employed the contrast sensitivity function (CSF) to modulate the embedding strength for both of good robustness and imperceptibility [33]. Bai et al. designed the hierarchical embedding intensity and hybrid perceptual mask according to TM, spatial activity and HVS so that the watermark was embedded into different regions with different intensities [34]. Daniel et al. used Luma Variation Tolerance (LVT) curve to design HVS-imperceptibility, which was used to guide watermark embedding into the spatial domain, but the robustness was not high [35]. Above the HDR watermarking methods only consider the visual characteristics of the frame, and the temporal characteristics are ignored for the HDR video.

Compared with the LDR image watermarking method, besides visual perception, the HDR video watermarking method should resist special HDR image processing, such as TM. TM reduces contrast [36, 37] and changes pixel values to a greater extent than traditional image processing, and therefore, it is essential to design the robust HDR image/video watermarking method. Bakhsh et al. applied RGB-to-LogLUV transform on the HDR image to obtain the luminance component, and watermark was embedded into in the DWT domain of the luminance component. [38]. Gholamreza et al. decomposed the HDR image by using DWT, chirp-z transformation and QR decomposition to embed watermark [39]. Xue et al. proposed two watermarking methods based on µ-Law and bilateral filtering, respectively, but the robustness still cannot be improved [40]. Vassilios et al. decomposed the original HDR image into the multiple LDR images through bracketing decomposition to embed watermark [41]. However, above watermarking methods considered each channel independently, and used the two-dimensional transformation to decompose each channel or the luminance component to embed watermark. Thus, those methods cannot make full use of strong correlations of RGB channels to obtain robust features. In order to improve the robustness, the multi-dimensional transformation is required to consider three channels as a whole to remain strong correlations of three channels. Since tensor-singular value decomposition (T-SVD) [42] can combine three channels as the third-order tensor, main characteristics of the HDR video can be preserved for watermarking robustness.

In this paper, a robust HDR video watermarking method by using T-SVD and the saliency map is proposed. Each frame is divided into non-overlapping blocks, and T-SVD is applied on each block to obtain the orthogonal tensor \({\mathcal{U}}\), which includes the first, second and third orthogonal matrices. Compared with the other two matrices, the second matrix has more correlations of the video frame and is chosen to embed watermark. Moreover, in order to obtain the trade-off between watermarking imperceptibility and robustness, the saliency map of the HDR video is computed based on fusing the spatial saliency and the temporal saliency, which can be used to determine the embedding strength. The main contributions of the paper are listed as follows.

  1. (1)

    Different from traditional transformation operated on one signal channel, T-SVD considers each frame of the HDR video as a whole to be transformed so that strong correlations of each frame can be preserved for robust watermark embedding.

  2. (2)

    The saliency map of the HDR video is extracted based on fusing the spatial saliency and the temporal saliency to balance the imperceptibility and robustness of the HDR video watermarking.

  3. (3)

    Experimental results show that the proposed method is more robust than the existing watermarking methods.

This paper is organized as follows. Section 2 introduces the background knowledge of related technology. In Sect. 3, we describe the processes of watermarking embedding and extraction. Section 4 discusses and analyzes the experimental results. Finally, Sect. 5 makes a summary.

2 Background

In this section, the related background technologies are introduced. The notation will be introduced in order to be easily described later. Variables are shown in italics, such as a, matrices are shown in bold letters, such as A, and higher-order tensors are shown in calligraphic letters, such as.

2.1 Tensor-singular value decomposition

With the development of the Internet, the multimedia data is developing in a multi-dimensional direction. Tensor is a form of multi-dimensional data, which can store a lot of information. P-order tensor can be written as

$$ {\mathcal{A}\ominus } = \left( {a_{{i_{1} i_{2} \ldots _{p} }} } \right) \in {\text{R}}^{{l_{1} \times l_{2} \times \ldots \times l_{p} }} , $$
(1)

where \(l_{1} ,l_{2} \ldots l_{p} \in {\text{Z}}\) indicates the number of elements in each dimension. Therefore, the vector can be considered as the first-order tensor, and the matrix can be considered as the second-order tensor.

Higher-order tensors can be represented by a set of matrices. For example, the third-order tensor can be divided into horizontal slice, lateral slice and frontal slice [43], which are represented as \({\mathcal{A}}_{{k::}}\),\({\mathcal{A}}_{{:k:}}\) and \({\mathcal{A}}_{{::k}}\), respectively, where \(k \in \left\{ {1,2,3} \right\}\).

SVD is an important decomposition method in linear algebra, which is suitable for any matrix. It can decompose a matrix into two orthogonal matrices and a diagonal matrix.

\({\mathbf{B}} \in R^{{m \times n}}\) is an image matrix. After SVD decomposition, B can be decomposed as

$$ {\mathbf{B}} = {\mathbf{USV}}^{T} $$
(2)

where \({\mathbf{U}} \in {\text{R}}^{{m \times m}}\) and \({\mathbf{V}} \in {\text{R}}^{{n \times n}}\) are the orthogonal matrices, which are left singular vectors and right singular vectors, respectively. \({\mathbf{S}} \in {\text{R}}^{{m \times n}}\) is the diagonal matrix with diagonal values in descending order that are called singular values.

SVD can be used to process two-dimensional data, but when SVD is used to process high-dimensional data, it may destroy the internal structure of high-dimensional data. For high-dimensional data, tensor can be used to process it, which can effectively retain internal structure of high-dimensional data. Thus, for higher-order singular value decomposition, T-SVD can be used. Let \({\mathcal{B}} \in {R}^{{l_{1} \times l_{2} \times l_{3} }}\) be the third-order tensor, and \({\mathcal{B}}\) can be defined as

$$ {\mathcal{B} = \mathcal{U}} \times {\mathcal{S}} \times {\mathcal{V}}^{{T}} $$
(3)

where \({\mathcal{U}} \in {R}^{{l_{1} \times l_{1} \times l_{3} }}\) and \({\mathcal{V}} \in {R}^{{l_{2} \times l_{2} \times l_{3} }}\) are orthogonal tensors, respectively, and \({\mathcal{S}} \in {R}^{{l_{1} \times l_{2} \times l_{3} }}\) is a diagonal tensor. Equation (3) is called T-SVD.

The third-order tensor \({\mathcal{B}}\) can also be written as

$$ \sum\limits_{{k = 1}}^{{l_{3} }} {{\mathcal{B}}_{{::k}} = \left( {\sum\limits_{{k = 1}}^{{l_{3} }} {{\mathcal{U}}_{{::k}} } } \right)} \left( {\sum\limits_{{k = 1}}^{{l_{3} }} {{\mathcal{S}}_{{::k}} } } \right)\left( {\sum\limits_{{k = 1}}^{{l_{3} }} {{\mathcal{V}}_{{::k}} ^{{T}} } } \right) $$
(4)

where \(\sum\nolimits_{{k = 1}}^{{l_{3} }} {{\mathcal{U}}_{{::k}} }\) and \(\sum\nolimits_{{k = 1}}^{{l_{3} }} {{\mathcal{V}}_{{::k}} }\) are the orthogonal matrices, respectively, and \(\sum\nolimits_{{k = 1}}^{{l_{3} }} {{\mathcal{S}}_{{::k}} }\) is the diagonal matrix. Each frame of the HDR video can be considered as the third-order tensor with the size of \(n \times m \times 3\), which \(l_{1} = n\), \(l_{2} = m\), \(l_{3} = 3\) and \(k \in \left\{ {1,2,3} \right\}\). Therefore, after operating T-SVD, two orthogonal tensors and one diagonal tensor are obtained according to Eq. (3). The orthogonal tensor \({\mathcal{U}}_{{::k}}\) consists of three matrices, and they are named as the first, the second and the third matrices when k is equal to 1, 2 and 3, respectively. Let U1, U2 and U3 be the first, second and third matrices, respectively. In related to the watermarking robustness, U2 is chosen to embed watermark, and the main discussion will be introduced in Sect. 4.1. To decompose the RGB image, T-SVD preserves the strong correlations of the RGB three channels, when each frame of HDR videos is treated as the third-order tensor.

2.2 Saliency map

The concept of visual saliency was introduced by Itti [44], and the saliency map can predict the most relevant and important areas of the images and videos, which can be used to guide watermark embedding. In the Itti’s visual model, First of all, the color, intensity and orientation visual features of the image are extracted, and the Gaussian pyramid of the color, intensity and orientation visual features is formed by subsampling. Then, the center-surround operation is used to compute the color, intensity and orientation saliency maps. Finally, the color, intensity and orientation saliency maps are merged to obtain the saliency map of the image. Bremond et al. [45] proposed a HDR saliency map method based on the framework of the Itti’s model, namely, contrast features (CF) model. Roland et al. considered that the HVS is sensitive to contrasts rather than to absolute differences, which is different from the Itti’s model.

In the Itti’s visual model, the intensity feature map was computed as

$$ I(c,x) = \left| {I(c)\Theta I(x)} \right|. $$
(5)

where \(c = \{ 2,3,4\}\),\(x = \{ c + 3,c + 4\}\), \(I(c)\) and \(I(x)\) are the intensity of the cth and xth levels in the pyramid, respectively, and \(\Theta\) stands for the across-scale subtraction between two maps.

However, in the CF model, the intensity feature map was modified as

$$ I^{'} (c,x) = \frac{{\left| {I(c)\Theta I(x)} \right|}}{{I(x)}} $$
(6)

In the Itti’s visual model, the orientation feature map was computed by the differences between Gabor filters at scales c and x.

$$ O(c,x,\theta ) = \left| {O(c,\theta )\Theta O(x,\theta )} \right|, $$
(7)

where \(O(c,\theta )\) and \(O(x,\theta )\) are the orientation pyramid obtained by convolving the intensity of the cth and xth levels with Gabor filters, respectively, and \(\theta \in \left\{ {0^{ \circ } ,45^{ \circ } ,90^{ \circ } ,135^{ \circ } } \right\}\).

In the CF model, its orientation feature map was obtained as

$$ O^{'} (c,x,\theta ) = \frac{{O(c,\theta )}}{{I(x)}}, $$
(8)

In the CF model, the color feature is obtained by linear combinations of R, G and B, which is the same as that of the Itti’s visual model. However, the HDR content has the rich color gamut, and the color perception under different luminance ranges was not considered in the CF model. Therefore, calculating the color feature in the CF model is not suitable for the HDR content. In order to overcome this issue, the color appearance model (CAM) [46] is employed to extract the color feature, which describes how the HVS perceives color information under different luminance ranges.

3 Proposed HDR video watermarking method

In this section, firstly, the saliency map of the HDR video is extracted to guide watermark embedding. Then, processes of watermark embedding are depicted based on T-SVD and saliency map extraction. Finally, processes of watermark extraction are introduced.

3.1 Saliency map extraction of the HDR video

The saliency map A of the HDR video is extracted from the spatial saliency map map1 and temporal saliency map map2. Since the human eye is sensitive to the luminance difference of the HDR image, the intensity feature can accurately reflect the luminance difference of the HDR image, and locate the significant areas. The color feature is a global feature, and it describes the distribution of the image information. Moreover, the color feature helps compute the intensity feature. Since the HVS has different perception sensitivity in the different orientation of the HDR image, the orientation feature computes different orientations of the image. For map2, the optical flow [47] is used to compute the motion information under different luminance ranges, which can use the change of pixels in the temporal domain and the correlations of adjacent frames to find the relationship between the previous frame and the current frame. To sum up, the color, intensity and orientation features of the HDR video are extracted to compute map1, and the optical flow is used to compute map2 as illustrated in Fig. 1.

Fig. 1
figure 1

Proposed saliency map extraction model

Since the HVS is sensitive to contrasts rather than to absolute differences, the intensity and orientation saliency maps are extracted by using the CF model. Compared with the LDR video, the HDR video has richer color levels, and the HVS perceives color differently under different luminance ranges [48]. Since CAM describes how the HVS perceives color information under different luminance ranges, the color feature of the HDR video is extracted by using the CAM. In the human eye, cone cells are responsible for color perception, and there are L-cones, M-cones and S-cones, which are sensitive to long, medium and short wavelengths, respectively. The color feature is computed in detail as follows.

  • Step. 1 The XYZ tristimulus values are transformed into LMS cone space by using Hunt–Pointer–Estevez (HPE) [46].

  • Step. 2 The cones’ absolute responses are obtained by

    $$ L^{'} = \frac{{L^{{n_{c} }} }}{{L^{{n_{c} }} + L_{a}^{{n_{c} }} }},M^{'} = \frac{{M^{{n_{c} }} }}{{M^{{n_{c} }} + L_{a}^{{n_{c} }} }},S^{'} = \frac{{S^{{n_{c} }} }}{{S^{{n_{c} }} + L_{a}^{{n_{c} }} }} $$
    (9)

    where La is the absolute level of adaptation, which is measured in cd/m2. \(L^{{n_{c} }}\), \(M^{{n_{c} }}\) and \(S^{{n_{c} }}\) are L-cones, M-cones and S-cones, respectively, and nc is set to 0.57 [49].

  • Step. 3 Red–Green channel CR-G is obtained as

    $$ C_{{R - G}} = \frac{1}{{11}}(11 \cdot L^{'} - 12M^{'} + S^{'} ) $$
    (10)
  • Step. 4 Yellow–Blue channel CY-B is obtained as

    $$ C_{{Y - B}} = \frac{1}{9}(L^{'} + M^{'} - 2 \cdot S^{'} ) $$
    (11)

map2 of the HDR video was computed by using the optical flow. The optical flow is the distribution of the surface velocities of movement of object in an image, which can be generated by the relative motion of object and observer. The speed of objects in the HDR video can be measured by estimating the optical flow between video frames. The optical flow is based on two assumptions. The first assumption is that the brightness is constant, which denotes that the brightness of a small region remains persistent despite its position changes. The second assumption is that the space is smooth, which denotes that adjacent points on object have similar velocities and the velocity filed of object is smooth. The process of the saliency map calculation of the HDR video is described as follows

  • Step. 1: Construct the Gaussian pyramid of the color, intensity and orientation visual features.

  • Step. 2: The color saliency map C, the intensity saliency map I and the orientation saliency map O are calculated as

    $$ {\mathbf{C}} = {NOR}(C_{{R - G}} ) + {NOR}(C_{{Y - B}} ) $$
    (12)
    $$ {\mathbf{I}} = {NOR}(I^{\prime}(c,x)) $$
    (13)
    $$ {\mathbf{O}} = {NOR}(O^{\prime}(c,x,\theta )) $$
    (14)

    where NOR(·) is the normalization operation.

  • Step. 3: The spatial saliency map map1 is computed as

    $$ {\mathbf{map}}_{{\mathbf{1}}} = ({\mathbf{C}} + {\mathbf{I}} + {\mathbf{O}})/3 $$
    (15)
  • Step. 4: The temporal saliency map map2 is computed by using the optical flow.

  • Step. 5: The saliency map A of the HDR video is obtained by fusing map1 and map2.

    $$ {\mathbf{A}}{\text{ = }}{\mathbf{map}}_{{\mathbf{1}}} {\text{ + }}{\mathbf{map}}_{{\mathbf{2}}} $$
    (16)

3.2 Watermark embedding

In this section, the process of watermark embedding is presented as illustrated in Fig. 2.

  • Step. 1 Each frame is regarded as the third-order tensor \({\mathcal{A}}\), with the size of M × N × 3. \({\mathcal{A}}\) is divided into non-overlapping blocks with the size of nb × nb × 3, and each block is denoted as \({\mathcal{K}}^{s}\), where \(s\) is the index of each block.

  • Step. 2 Perform T-SVD on each block.

    $$ {\mathcal{K}}^{s} = {\mathcal{U}}^{s} \times {\mathcal{S}}^{s} \times ({\mathcal{V}}^{s} )^{{T}} . $$
    (17)

    where \({\mathbf{U}}_{{\text{2}}}^{s}\) is the second matrix of \({\mathcal{U}}^{s}\).

  • Step. 3 The saliency Map A is obtained by Eq. (16). The embedding strength matrix T is computed according to Eq. (18), and T is used to guide watermark embedding.

    $$ \left\{ {\begin{array}{*{20}c} {{\mathbf{T}}(i,j) = \partial _{{\min }} } & {{\mathbf{A}}(i,j) \ge Med} \\ {{\mathbf{T}}(i,j) = \partial _{{\max }} } & {{\mathbf{A}}(i,j) < Med} \\ \end{array} } \right., $$
    (18)

    where \(Med = (\max ({\mathbf{A}}) + \min ({\mathbf{A}}))/2\). ∂min and ∂max are the minimum and maximum embedding strength, respectively, which will be discussed in Sect. 4.2.

  • Step. 4. In order to provide the appropriate embedding strength for each block, the embedding strength matrix T is also divided into non-overlapping blocks with the size of nb × nb, which is denoted as \({\mathbf{Q}}^{s}\). Each bit of W is embedded into \({\mathbf{U}}_{{\text{2}}}^{s}\).

    $$ \left\{ {\begin{array}{*{20}c} {{\mathbf{U}}_{{\text{2}}}^{s} (2,1) = sign({\mathbf{U}}_{{\text{2}}}^{s} (2,1)) \times (u + m/2)} \\ {{\mathbf{U}}_{{\text{2}}}^{s} (3,1) = sign({\mathbf{U}}_{{\text{2}}}^{s} (3,1)) \times (u - m/2)} \\ \end{array} } \right.{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {if }{\kern 1pt} {\mathbf{W}}(i,j) = 1 $$
    (19)
    $$ \left\{ {\begin{array}{*{20}c} {{\mathbf{U}}_{{\text{2}}}^{s} (2,1) = sign({\mathbf{U}}_{{\text{2}}}^{s} (2,1)) \times (u - m/2)} \\ {{\mathbf{U}}_{{\text{2}}}^{s} (3,1) = sign({\mathbf{U}}_{{\text{2}}}^{s} (3,1)) \times (u + m/2)} \\ \end{array} } \right.{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {if }{\kern 1pt} {\mathbf{W}}(i,j) = 0 $$
    (20)

    where \(u = (abs({\mathbf{U}}_{{\text{2}}}^{s} (2,1)) + abs({\mathbf{U}}_{{\text{2}}}^{s} (3,1)))/2\),\(m = {sum}({\mathbf{Q}}^{s} )/16\), and \({\text{sum}}( \cdot )\) returns the sum of the matrix. The modified orthogonal tensor \({\mathcal{U}}_{w}^{s}\) is obtained.

  • Step. 5 Perform inverse T-SVD on each block.

    $$ {\mathcal{K}}_{w}^{s} = {\mathcal{U}}_{w}^{s} \times {\mathcal{S}}^{s} \times ({\mathcal{V}}^{s} )^{{T}} $$
    (21)
  • Step. 6 Repeat steps 1 to 5 until watermark is embedded into all frames of the HDR video.

Fig. 2
figure 2

Watermark embedding

If the proposed method is applied in the gray-level video, a group of frames can be represented as a tensor with three dimensions, so that temporal correlation can be mined. A group of frames as the tensor is divided into non-overlapping blocks, and each block is decomposed by using T-SVD to extract \({\mathbf{U}}_{{\text{2}}}^{s}\) for embedding watermark.

3.3 Watermark extraction

Watermark extraction is the reverse process of watermark embedding as illustrated in Fig. 3.

  • Step. 1 Each watermarked frame is regarded as the third-order tensor \({\mathcal{A}}^{*}\), which is divided into non-overlapping blocks with the size of nb × nb × 4, and each block is denoted as \({\mathcal{K}}^{{*s}}\).

  • Step. 2 Perform T-SVD on each block

    $$ {\mathcal{K}}^{*} = {\mathcal{U}}^{*} \times {\mathcal{S}}^{*} \times ({\mathcal{V}}^{*} )^{{T}} $$
    (22)

    where \({\mathbf{U}}_{{\text{2}}}^{{*s}}\) is the second matrix of \({\mathcal{U}}^{{*s}}\).

  • Step. 3 Each bit of watermark W*is extracted as

    $$ \left\{ {\begin{array}{*{20}c} {{\mathbf{W}}^{*} (i,j) = 1} & {{\text{if }}{abs(}{\mathbf{U}}_{2}^{{*s}} {(2,1))} \ge {abs(}{\mathbf{U}}_{2}^{{*s}} {(3,1))}} \\ {{\mathbf{W}}^{*} (i,j) = 0} & {{if abs(}{\mathbf{U}}_{2}^{{*s}} {(2,1)) < abs(}{\mathbf{U}}_{2}^{{*s}} {(3,1))}} \\ \end{array} } \right. $$
    (23)
  • Step.4 Repeat steps 1 to 3 until watermark is extracted from all frames of the watermarked HDR video.

Fig. 3
figure 3

Watermark extracting

4 Experimental results and discussion

In this section, FireEater, EBU, Tibul, BallooFestival, Market and Sunrise HDR videos as illustrated in Fig. 4 are used for testing, which contains 150 frames, and the resolution of each frame is the 1920 × 1080 × 3. Watermark is illustrated in Fig. 5, and 18 types of tone mapping (TM) attacks are selected from HDR Toolbox as shown in Table.1. Figure 6 shows the saliency maps of HDR videos, where the most relevant and important areas of the HDR video are extracted.

Fig. 4
figure 4

HDR videos

Fig. 5
figure 5

Watermark

Table 1 18 types of TM attacks
Fig. 6
figure 6

Saliency map of HDR videos

The HDR-VDP-2 metric is used to evaluate the quality of the watermarked HDR video [50], in which q is the imperceptibility index and is from 0 to 100. HDR-VDP75% and HDR-VDP95% are the probabilities of detection in at least 75% and 95% of the images, respectively. High q denotes high visual quality, and high HDR-VDP75% and HDR-VDP95% denote the low visual quality of the watermarked HDR video on the contrary. The bit error rate (BER) is used to evaluate the correctness of watermark extraction, which can be expressed as the watermarking robustness.

$$ BER = \frac{{N_{w} }}{{N_{t} }} $$
(24)

where \(N_{w}\) and \(N_{t}\) are the number of false watermark bits and the number of total watermark bits, respectively. Normalized correlation (NC) is used to evaluate the similarity between extracted watermark and original watermark as well.

$$ NC = \frac{{\sum\nolimits_{{i = 1}}^{K} {\sum\nolimits_{{i = 1}}^{J} {({\mathbf{W}} \times {\mathbf{W}}^{{\mathbf{*}}} )} } }}{{\sqrt {\sum\nolimits_{{i = 1}}^{K} {\sum\nolimits_{{i = 1}}^{J} {({\mathbf{W}} \times {\mathbf{W}})} } } \sqrt {\sum\nolimits_{{i = 1}}^{K} {\sum\nolimits_{{i = 1}}^{J} {({\mathbf{W}}^{{\mathbf{*}}} \times {\mathbf{W}}^{{\mathbf{*}}} )} } } }} $$
(25)

where W and W* are the original watermark and extracted watermark, respectively, and K × J is the size of watermark.

4.1 Discussion of U1, U2 and U3

In order to select the most suitable matrix to embed watermark, watermark is embedded into three orthogonal matrices U1, U2 and U3 by using the same way in Sect. 3.2, and these three methods are named as Proposed-U1, Proposed-U2 and Proposed-U3, respectively. Watermark is embedded into all six HDR videos, and 3 types of TM attacks are operated on those watermarked HDR videos. Averages BER of these HDR video are shown in Table 2, and obviously BERs of Proposed-U2 are lower than those of Proposed-U1 and Proposed-U3, which denotes that Proposed-U2 is more robust and U2 has more correlations of the video frame. Thus, compared with Proposed-U1 and Proposed-U3, Proposed-U2 is more suitable to be selected to protect the copyright of the HDR video.

Table 2 Average BER of all HDR videos

4.2 Discussion of the embedding strength

min and ∂max are related to the trade-off between the invisibility and robustness of the proposed method. Firstly, the watermarked HDR videos with respect to different embedding strengths are obtained by using the proposed watermarking method. Then, the imperceptibility index and the robustness against several typical TM attacks are computed. The initial value of the intensity ∂min and ∂max is both set from 0.01 to 0.1, and ∂max is greater than ∂min. The values of ∂min and ∂max are gradually increased until the watermarked HDR video is discovered visually. f is computed under different embedding strengths, and the large f represents that ∂min and ∂max are most suitable for embedding watermark. Let FireEater be an example, from the subjective perspective, when ∂min and ∂max exceed 0.03, the watermarked FireEater can be discovered visually. When ∂min and ∂max are assigned to different values, different q and BER are computed.

In order to obtain optimal ∂min and ∂max for different HDR videos, Eq. (26) is applied [32].

$$ f = q_{y} + \frac{1}{5} \times (\sum\limits_{{y = 1}}^{5} {(1 - BER_{y} )} \times 100), $$
(26)

where y is the type of the TM attacks, which are TM1, TM2, TM3, TM4 and TM5, respectively. qy is the imperceptibility index of the watermarked HDR video with different ∂min and ∂max. BERy is the bit error rate under the yth TM attack. When ∂min = 0.01 and ∂max = 0.02, f has the maximum value. Similarly, ∂min and ∂max of other HDR videos can be obtained as shown in Table3.

Table 3 The embedding strength of HDR videos

4.3 Discussion of n b

In this section, the block size nb is discussed in the experiment for the performance of the proposed method. As shown in Table 4, when nb is 2, the embedding capacity is largest and watermarking imperceptibility is the lowest. As nb is increased, the corresponding embedding capacity is decreased and the imperceptibility is increased. Moreover, for different values of nb, watermarking robustness is similar. Thus, nb is only related to the embedding capacity and imperceptibility, and can be set for different requirements.

Table 4 The results of different nb

4.4 Invisibility and robustness

Table5 shows values of HDR-VDP75%, HDR-VDP95% and q, and it is obviously that averages of HDR-VDP75%, HDR-VDP95% and q are 7.888%, 4.680% and 73.861, respectively, which denote the watermarked HDR video cannot be observed by human visions as illustrated in Fig. 7, since the saliency map guides watermark embedding effectively. Figure 8 shows nearly 100% of the watermark image can be extracted from the different watermarked HDR videos.

Table 5 The invisibility of the HDR videos
Fig. 7
figure 7

Watermarked HDR videos

Fig. 8
figure 8

Extracted watermark

In order to prove the robustness of the proposed HDR video watermarking method on TM attacks, 18 TM attacks are operated on FireEater, EBU, Tibul, BallooFestival, Market and Sunrise, respectively. Figure 9 shows the attacked HDR Sunrise by using the part of TM attacks. From Table6, we can see that BERs of FireEater, EBU, Tibul, BallooFestival, Market and Sunrise are lower than 0.15, which denotes that the proposed method can resist different TM attacks.

Fig. 9
figure 9

Attacked HDR sunrise by using TM attacks

Table 6 BER of 18 TM attacks

In order to show the robustness on video attacks, a variety of video attacks are operated as shown in Table7, such as frame averaging, and H.265 (encoder_intra_main10, GOP = 1, QP = 22). For frame averaging, we use the average of current frame and its two nearest neighbors to replace the current frame. Frame averaging is to compute the average of multiple frames, which will change the video much and destroy watermark. However, the proposed method still can extract watermark since corresponding BER is higher than 90%, and it shows the robustness of the proposed method. In the proposed method, watermark is embedded into each frame of the video; thus, frame swapping and frame dropping have little influence on the watermark extraction, and can nearly extract 100% of watermark. Averages BER of these HDR video are 0.0044, 0.0348, 0.0056, 0.0323, 0.0098 and 0.0056, respectively, which denote the proposed method has the ability of resisting those video attacks.

Table 7 BER of video attacks

In order to show that the proposed method can also resist hybrid attacks, six HDR videos are also operated by using a variety of hybrid attacks, such as Gaussian filter(3 × 3) + TM11, and Salt & Pepper(0.001) + TM8. As shown in Table8, we can see that BERs of the proposed method are lower than 0.1, and it shows that the proposed method performs well on hybrid attacks.

Table 8 BER of hybrid attacks

4.5 Comparative

Table9 shows the comparison results of the proposed method, Kang’s [51] and Joshi’s [52] on embedding capacity, imperceptibility and time complexity. From Table9, when block size nb is set to 4, the watermarking embedding capacity and watermarking invisibility are much better, but the running time is higher compared with Kang’s [51] and Joshi’s [52]. The running time of the proposed method is 1289.72 s, where the saliency map computation costs 298.12 s. But if nb is increased to 12, the running time is decreased to 697.85 s, which is similar as those of Kang’s [51] and Joshi’s [52].

Table 9 Comparison of time complexity, embedding capacity and imperceptibility

Bakhsh’s [38], Kang’s [51] and Joshi’s [52] are used to be compared for demonstrating the robustness of the proposed method when all HDR videos are under different attacks as shown in Table10. From Table10, we can see that the proposed method is better than Bakhsh’s [38], Kang’s [51] and Joshi’s [52] methods. For example, for TM1, averages BER of the proposed method are nearly 0.04, 0.02 and 0.06 lower than those of Bakhsh’s [38], Kang’s [51] and Joshi’s [52], respectively. For TM6, averages BER of the proposed method are nearly 0.05, 0.2 and 0.06 lower than those of Bakhsh’s [38], Kang’s [51] and Joshi’s [52], respectively. Compared with Bakhsh’s [38], although average BER of the proposed method is higher for TM12, lower for other attacks, such as Sharpen (0.5) + TM14, Passion noise + TM15 and Salt & Pepper(0.001) + TM8. Compared with Kang’s [51] and Joshi’s [52], the proposed method is obviously better. Considering all kinds of attacks, most BERs of the proposed method are lower than those of Bakhsh’s, Kang’s and Joshi’s, which denotes that the proposed method is superior to above three methods. In summary, the proposed method has the capability of protecting the HDR video and its TMO representations, and it is mainly because T-SVD preserves the robust features of the HDR video.

Table 10 Comparison of watermark extraction from all HDR video under different attacks

In order to objectively evaluate the robustness of the proposed method, NC is also used. The watermarked Tibul is attacked by a variety of attacks as illustrated in Table11. From Table 11, we can see that NCs of the proposed method are similar to those of Joshi’s for TM8 and TM9, but are higher than those of Joshi’s [52] for other attacks, especially for Passion + TM15 and Scaling (1/4). Compared with Bakhsh’s [38] and Kang’s [51], the proposed method is obviously better. In all, average NC of the proposed method is higher than those of Bakhsh’s [38], Kang’s [51] and Joshi’s [52]. Thus, the proposed method can resist a variety attacks to protect the HDR videos, which shows the effectiveness of the proposed method.

Table 11 Comparison of watermark extraction from Tibul under different attacks

4.6 Robustness on other HDR videos

In order to verify the effectiveness of the proposed method again, one HDR database [53] consisting of 10 HDR videos is also used for testing. The watermarked HDR videos are attacked by different TM attacks, such as TM3, TM8 and so on. From Table 12, we can see that BERs of the proposed method are obviously lower than those of Bakhsh’s [38], Kang’s [51] and Joshi’s [52] methods, which denotes that the proposed method has strong robustness, and can efficiently protect the copyright of the HDR video again.

Table 12 Comparisons of other HDR videos

5 Conclusion

In this paper, a robust HDR video watermarking method based on T-SVD and the saliency map is proposed. Each frame can be regarded as the third-order tensor to obtain the most robust domain by using T-SVD. After T-SVD, the orthogonal tensor is calculated, which consists of the first, second and third matrices. Compared with the other two matrices, the second matrix is more robust; therefore, it is more suitable to embed watermark. The saliency map is computed, which can represent the most visually important areas of each frame for determining the embedding strength. Experimental results show that the proposed method can effectively protect the copyright of the HDR videos, and is robust on various attacks. However, the proposed method cannot resist lossy compression not well. In the future work, we will further explore visual perception factors of the HDR video to guide watermark embedding for improving watermarking efficiency.