1 Introduction

With the rapid development of video sharing technology, it is essential to research the algorithms of hiding data into videos for copyright protection, covert communication, integrity authentication, and so on [1, 2]. H.264 is a state-of-the-art video compression standard and has become the most widely deployed video codec. At the H.264 encoder, a residual pixel block is obtained by subtracting a prediction block from its original pixel block in YUV video. After lossy compression [discrete cosine transformation (DCT) and quantization], the residual block becomes a quantized DCT (QDCT) block, which has one direct current coefficient and some alternating current (AC) coefficients. After entropy encoding (lossless compression) of each QDCT macro block (MB), YUV video is encoded into an H.264 video. Therefore, the information hidden by changing QDCT coefficients can be fully extracted after entropy decoding. It is most common to embed data into QDCT coefficients, but the distortion caused by hiding data will spread and accumulate [3, 4]. Using general data hiding methods to embed information will cause the permanent distortion of host multimedia. However, hiding data into QDCT coefficients with a reversible data hiding (RDH) algorithm, we could completely restore the value of QDCT coefficient after extracting information, so we can save and enjoy important videos without information and distortion caused by hiding data. Consequently, there will not be too many network videos with secret information so that it is difficult for others to find stego cover. In addition, RDH methods can also be applied in video error concealment [58] and some sensitive application fields [913] such as multimedia archive management, medical multimedia sharing, military affair, remote sensing, and law enforcement.

In recent years, many RDH methods such as lossless compression [1416], difference expansion [5, 6, 8, 11, 13, 1722], histogram shifting (HS) [7, 2334] and integer transform [10, 12] have been presented. In a RDH framework based on lossless compression [14], the compressible parts, which are extracted nondestructively from the original cover, are compressed by a lossless compression algorithm. Then the to-be-hidden information is attached to the back of the compressed parts. Correspondingly, the receiver extracts the information from the end of the sequence and recovers the original cover by decompressing the compressed parts. When this method is used to hide information, little embedding capacity and high computational complexity are two issues which should be resolved. Furthermore, in the lossless compression scheme [16], a recursive code construction and a lossless compression algorithm are used to hide data. However, it is not easy to use the recursive construction to hide information into H.264 video since this video is encoded by treating each MB sequentially [35]. In the difference expansion algorithm [17], the difference between two adjacent pixels in a pixel pair was expanded to hide one data bit. Moreover, prediction-error expansion is used to improve the hiding performance of difference expansion by expanding the difference between the pixel and its prediction [19].

The peak of image histogram is used to hide information in the HS method proposed by Ni et al. [23]. In order to hide one data bit, each pixel value is changed at most by adding or subtracting 1. Li et al. [31] proposed a general framework of HS-based RDH, which could be utilized to construct a RDH algorithm by simply designing shifting and embedding functions. In order to achieve a better capacity-distortion trade-off, all kinds of prediction methods are used to get sharp difference histograms [36].

However, in general HS-based RDH methods, each pixel, difference or prediction-error is singly changed for hiding a data bit, which constrains the capacity-distortion performance. In order to solve this limiting problem of the capacity-distortion performance and the embedding efficiency, an efficient RDH algorithm based on three-dimensional (3D) HS is proposed in this work. Stereo H.264 videos, encoded or decoded through multi-view coding (MVC), are just taken as covers in this paper. An arbitrary block, which does not predict others, is treated as an embeddable block, from which three QDCT AC coefficients are randomly chosen as an embeddable unit. Coefficient units are divided into disjoint regions. On the basis of the regions of coefficient units, the 3D histogram is expanded or shifted for hiding data reversibly. In order to embed two data bits, two coefficients may be modified in the conventional HS, whereas only one coefficient may be changed by using the proposed scheme. Compared with some state-of-the-art methods, the presented algorithm has superior payload-distortion performance, which is verified by the experimental results.

The rest of the paper is organized as follows: Section 2 presents the leading idea of 3D HS. In Sect. 3, the proposed RDH algorithm for MVC video and its implementation details are described. The hiding performance of the presented algorithm is evaluated via experimental results in Sect. 4. Finally, the conclusions of the paper are made in Sect. 5.

2 RDH method using HS

2.1 Conventional HS

Ni et al.’s HS method [16] could be used for hiding data into QDCT coefficients of MVC video. Denote a QDCT coefficient as F x1 and the marked QDCT coefficient as \( F^{\prime}_{x1} \). Information could be hidden by expanding and shifting one-dimensional (1D) histogram as shown in Fig. 1 and (1), where m i ∈ {0, 1} is a to-be-embedded data bit.

$$ F^{\prime}_{x1} = \left\{ {\begin{array}{*{20}l} {0,} & {{\text{ if (}}F_{x1} \,{ = 0}) \wedge (m_{i} \,{ = 0})} \\ {1,} & {{\text{ if }}(F_{x1} \,{ = 0}) \wedge (m_{i}\, { = 1})} \\ {F_{x1} + 1,} & \,\,{{\text{if }}F_{x1}\, > 0} \\ \end{array} } \right. $$
(1)
Fig. 1
figure 1

Conventional HS

The positive coefficients are shifted for making vacant space. When the value of F x1 is 0, one data bit can be hidden, where the value of the coefficient F x1 will become 1 if m i is 1, and will not be changed if m i is 0. Accordingly, the hidden information m i could be extracted from the marked QDCT coefficient \( F^{\prime}_{x1} \), and the value of QDCT coefficient can be completely restored as follows:

  1. 1.

    If \( F^{\prime}_{x1} \) = 0, the extracted information bit m i  = 0 and the original coefficient F x1 = 0.

  2. 2.

    If \( F^{\prime}_{x1} \) = 1, the extracted information bit m i  = 1 and the original coefficient F x1 = 0.

  3. 3.

    If \( F^{\prime}_{x1} \) > 1, there is no hidden data in the coefficient and the original coefficient F x1 = \( F^{\prime}_{x1} \) − 1.

In this method, the 1D coefficient histogram is defined by (2), where # is the cardinal number of a set, s 1 is a nonnegative integer.

$$ h\left( {s_{1} } \right) = \# \left\{ {F_{x1} |F_{x1} = s_{1} } \right\} $$
(2)

If the scheme shown in Fig. 1 is used to embed data into each of the three QDCT coefficients denoted by F x1, F x2, and F x3, the mapping will be a traditional 3D HS as shown in Fig. 2, where 3D histogram is defined as

$$ w\left( {s_{ 1} ,s_{ 2} ,s_{ 3} } \right) = \# \left\{ {\left( {F_{x 1} ,F_{x 2} ,F_{x 3} } \right)|F_{x 1} = s_{ 1} ,F_{x 2} = s_{ 2} ,F_{x 3} = s_{ 3} } \right\} $$
(3)

where s 2 and s 3 are nonnegative integers.

Fig. 2
figure 2

Conventional 3D HS

2.2 Proposed 3D HS

In histogram shifting schemes, different positions are used to represent different information. As shown in Fig. 2, eight adjacent places are needed to store eight kinds of three information bits (000, 001, 010, 011, 100, 101, 110, and 111), four adjacent positions are needed to represent four sorts of two information bits (00, 01, 10, and 11), and two neighboring positions are used to record two kinds of one data bit (0 and 1). When only one position could be used, no message can be hidden, and the original position should be shifted to its neighboring place. It can be observed that the maximum cost of each QDCT coefficient group in Fig. 2 is 3, which may bring obvious distortion.

In order to reduce the expense, we first search different positions to store different information with at most one change. If only nonnegative coefficient groups are used to store information, four positions [(0,0,0), (1,0,0), (0,1,0), and (0,0,1)] could be used for four sorts of two data bits. When the value of coefficient group (F x1, F x2, F x3) is (0,0,0), which can be used to represent two data bits 00 with no modification, it could be expanded to its neighboring positions (1,0,0), (0,1,0), and (0,0,1) for signifying two data bits 01, 10, and 11 with one modification, respectively. When the value of coefficient group (F x1, F x2, F x3) is (F x1,0,0) (F x1 > 0), it could be expanded to its neighboring positions (F x1, 0, 1), (F x1 + 1,0,0), and (F x1, 1,0) for signifying two data bits 00, 01, and 10 with one modification, respectively. And it can be expanded to (F x1, 1, 1) for signifying two data bits 11, where the cost is 2. When F x1, F x2, F x3 are all positive integers, the group is shifted to (F x1, F x2 + 1, F x3 + 1), where the cost is 2. In Fig. 2, by contrast, the group is shifted to (F x1 + 1, F x2 + 1, F x3 + 1), where the cost is 3. In this way, compared with the convention HS, we can get a more efficient 3D HS scheme, as shown in Fig. 3, where the set (denoted as J) of all the points could be divided into ten disjoint sets defined as follows:

$$ \begin{aligned} J_{ 1} & = \left\{ {\left( {0,\,0,0} \right)} \right\} \\ J_{ 2} & = \left\{ {\left( {F_{x 1} ,0,0} \right)|F_{x 1} > 0} \right\} \\ J_{ 3} & = \left\{ {\left( {0,F_{x 2} ,0} \right)|F_{x 2} > 0} \right\} \\ J_{ 4} & = \left\{ {\left( {0,0, 1} \right)} \right\} \\ J_{ 5} & = \left\{ {\left( {0,0,F_{x 3} } \right)|F_{x 3} > 1} \right\} \\ J_{ 6} & = \left\{ {\left( {F_{x 1} ,F_{x 2} , \, 0} \right)|F_{x 1} > 0,F_{x 2} > 0} \right\} \\ J_{ 7} & = \left\{ {\left( { 1, \, 0,F_{x 3} } \right)|F_{x 3} > 0} \right\} \\ J_{ 8} & = \left\{ {\left( {F_{x 1} ,0,F_{x 3} } \right)|F_{x 1} > 1,F_{x 3} > 0} \right\} \\ J_{ 9} & = \left\{ {\left( {0,F_{x 2} ,F_{x 3} } \right)|F_{x 2} > \, 0,F_{x 3} > 0} \right\} \\ J_{ 10} & = \left\{ {\left( {F_{x 1} ,F_{x 2} ,F_{x 3} } \right)|F_{x 1} > 0,F_{x 2} > 0,F_{x 3} > 0} \right\} \\ \end{aligned} $$
Fig. 3
figure 3

Proposed 3D HS

We divide the chosen coefficient groups into different sets and hide information based on the set the value of the coefficient group resides in. Accordingly, the embedding process can be described below.

  1. 1.

    if the coefficient group (F x1, F x2, F x3) ∈ J 1, the marked coefficient group denoted by \( (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) \) will be

    $$ (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) = \left\{ {\begin{array}{*{20}l} {(F_{x1} ,F_{x2} ,F_{x3} ) ,} & {{\text{if }}\,m_{i} m_{i + 1} = 00} \\ {(F_{x 1} + 1,F_{x 2} ,F_{x 3} ) ,} & {{\text{if}}\, \, m_{i} m_{i + 1} = 01} \\ {(F_{x 1} ,F_{x 2} + 1,F_{x 3} ) ,} & {{\text{if}}\, \, m_{i} m_{i + 1} = 10} \\ {(F_{x 1} ,F_{x 2} ,F_{x 3} + 1) ,} & {{\text{if}}\,\,m_{i} m_{i + 1} = 11} \\ \end{array} } \right. $$
    (4)
  2. 2.

    if the coefficient group (F x1, F x2, F x3) ∈ J 2, the marked coefficient group \( (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) \) will be

    $$ (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) = \left\{ {\begin{array}{*{20}l} {(F_{x1} ,F_{x2} ,F_{x3} + 1) ,} & {{\text{if }}\,m_{i} m_{i + 1} = 00} \\ {(F_{x1} + 1,F_{x2} ,F_{x3} ) ,} & {{\text{if }}\,m_{i} m_{i + 1} = 01} \\ {(F_{x1} ,F_{x2} + 1,F_{x3} ) ,} & {{\text{if }}\,m_{i} m_{i + 1} = 10} \\ {(F_{x1} ,F_{x2} + 1,F_{x3} + 1) ,} & {{\text{if}}\, \, m_{i} m_{i + 1} = 11} \\ \end{array} } \right. $$
    (5)
  3. 3.

    if the coefficient group (F x1, F x2, F x3) ∈ J 3J 5, the marked coefficient group \( (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) \) will be

    $$ (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) = \left\{ {\begin{array}{*{20}l} {(F_{x1} ,F_{x2} ,F_{x3} + 1) ,} & {{\text{if }}\,m_{i} = 0} \\ {(F_{x1} ,F_{x2} + 1,F_{x3} ) ,} & {{\text{if }}\,m_{i} = 1} \\ \end{array} } \right. $$
    (6)
  4. 4.

    if the coefficient group (F x1, F x2, F x3) ∈ J 4, the marked coefficient group \( (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) \) will be

    $$ (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) = \left\{ {\begin{array}{*{20}l} {(F_{x1} ,F_{x2} ,F_{x3} + 1) ,} & {{\text{if }}\,m_{i} = 0} \\ {(F_{x1} + 1,F_{x2} ,F_{x3} + 1) ,} & {{\text{if }}\,m_{i} = 1} \\ \end{array} } \right. $$
    (7)
  5. 5.

    if the coefficient group (F x1, F x2, F x3) ∈ J 6, the marked coefficient group \( (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) \) will be

    $$ (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) = \left\{ {\begin{array}{*{20}l} {(F_{x1} ,F_{x2} + 1,F_{x3} ) ,} & {{\text{if }}\,m_{i} = 0} \\ {(F_{x1} ,F_{x2} + 1,F_{x3} + 1) ,} & {{\text{if }}\,m_{i} = 1} \\ \end{array} } \right. $$
    (8)
  6. 6.

    if the coefficient group (F x1, F x2, F x3) ∈ J 7, the marked coefficient group \( (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) \) will be

    $$ (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) = \left\{ {\begin{array}{*{20}l} {(F_{x1} ,F_{x2} + 1,F_{x3} + 1) , {\text{ if }}m_{i} = 0} \hfill \\ {(F_{x1} ,F_{x2} ,F_{x3} + 2) ,\,\,\,\,\,\,\,\,\,\,{\text{ if }}m_{i} = 1} \hfill \\ \end{array} } \right. $$
    (9)
  7. 7.

    if the coefficient group (F x1, F x2, F x3) ∈ J 8, the marked coefficient group \( (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) \) will be

    $$ (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) = \left\{ {\begin{array}{*{20}l} {(F_{x1} ,F_{x2} ,F_{x3} + 1) ,} & {{\text{if }}m_{i} = 0} \\ {(F_{x1} ,F_{x2} + 1,F_{x3} + 1) ,} & {{\text{if }}m_{i} = 1} \\ \end{array} } \right. $$
    (10)
  8. 8.

    if the coefficient group (F x1, F x2, F x3) ∈ J 9J 10, no information is hidden, and the marked coefficient group \( (F^{\prime}_{x1} ,F^{\prime}_{x2} ,F^{\prime}_{x3} ) \) will be taken as (F x1, F x2 + 1, F x3 + 1).

After data is hidden by using these mapping rules, we can extract the information based on the set which the value of the marked coefficient group may reside in, and recover the value of the marked coefficient group according to the reverse process of embedding.

2.3 Embedding capacity and distortion

When the 1D HS is used for hiding data, the embedding capacity denoted as EC is h(0). For QDCT coefficients, the embedding distortion denoted as ED in terms of l 2-error can be formulated as

$$ {\text{ED}} = \frac{1}{2}h(0) + \sum\limits_{s = 1}^{ + \infty } {h(s_{1} )} $$
(11)

The embedding capacities of the conventional 3D HS and the proposed 3D HS, denoted as ECcon and ECpro, can be calculated by (12) and (13).

$$ \begin{aligned} {\text{EC}}_{\text{con}} & = 3\sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{1} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} + 2\sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{2} \cup J_{3} \cup J_{4} \cup J_{5} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} \\ & \quad + \sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{6} \cup J_{7} \cup J_{8} \cup J_{9} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} \\ \end{aligned} $$
(12)
$$ {\text{EC}}_{\text{pro}} = 2\sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{1} \cup J_{2} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} + \sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{3} \cup J_{4} \cup J_{5} \cup J_{6} \cup J_{7} \cup J_{8} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} $$
(13)

For QDCT coefficients, the embedding distortion in terms of \( l^{2} \)-error of the conventional 3D HS and the proposed 3D HS, denoted as EDcon and EDpro, can be formulated as

$$ \begin{aligned} {\text{ED}}_{\text{con}} & & = \frac{3}{2}\sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{1} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} + 2\sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{2} \cup J_{3} \cup J_{4} \cup J_{5} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} \\ \quad + \frac{5}{2}\sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{6} \cup J_{7} \cup J_{8} \cup J_{9} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} + 3\sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{10} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} \\ \end{aligned} $$
(14)

and

$$ \begin{aligned} {\text{ED}}_{\text{pro}} & = \frac{3}{4}\sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{1} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} + \frac{5}{4}\sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{2} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} + \sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{3} \cup J_{5} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} \\ & \quad + \frac{3}{2}\sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{4} \cup J_{6} \cup J_{8} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} + 2\sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{7} \cup J_{9} \cup J_{10} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} \\ \end{aligned} $$
(15)

According to (12) and (13), it can be inferred that the difference of embedding capacity between the presented 3D HS and the conventional 3D HS is

$$ {\text{EC}}_{\text{con}} - {\text{EC}}_{\text{pro}} = \sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{1} \cup J_{3} \cup J_{4} \cup J_{5} \cup J_{9} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} $$
(16)

According to (14) and (15), it can be inferred that the difference of embedding distortion between the presented 3D HS and the conventional 3D HS is

$$ \begin{aligned} {\text{ED}}_{\text{con}} - {\text{ED}}_{\text{pro}} & = \frac{3}{4}\sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{1} \cup J_{2} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} + \sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{3} \cup J_{5} \cup J_{6} \cup J_{8} \cup J_{10} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} \\ & \quad + \frac{1}{2}\sum\limits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{4} \cup J_{7} \cup J_{9} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} \\ \end{aligned} $$
(17)

Therefore, compared with the conventional 3D HS, although the capacity obtained by our method is lower, the distortion is decreased greatly. Two examples are given to show the advantage of the proposed method.

  1. 1.

    For the group (F x1, F x2, F x3) = (0, 0, 0) ∈ J 1, in the proposed 3D HS, the distortion is 0, 1, 1 and 1 when (m i , m i+1) is (0, 0), (0, 1), (1, 0), and (1, 1), respectively. However, in the conventional 3D HS, the distortion is 2 if (m i , m i+1) is (1, 1). Therefore, it can be inferred that when the quantity of data hidden by the proposed method is the same as that hidden by the conventional 3D HS, the proposed method could achieve preferable quality compared with the traditional 3D HS.

  2. 2.

    For the group (F x1, F x2, F x3) = (2, 0, 0) ∈ J 2, in the proposed 3D HS, the cost is 1, 1, 1 and 2 when (m i , m i+1) is (0, 0), (0, 1), (1, 0), and (1, 1), respectively. In the conventional 3D HS, the cost is 1, 2, 2 and 3, respectively. Accordingly, for the coefficient groups in the set J 2, in order to hide the same number of information, the presented 3D HS can be used to obtain better cover quality compared with the conventional 3D HS.

In general, when the coefficient group (F x1, F x2, F x3) ∈ J 2, the same embedding capacity, i.e., \( 2\sum\nolimits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{2} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} \), can be acquired by the two methods, but lower distortion will be caused by using the proposed scheme, i.e. \( \frac{5}{4}\sum\nolimits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{2} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} < 2\sum\nolimits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{2} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} \). When the coefficient group (F x1, F x2, F x3) ∈J 1, the proposed method’s efficiency (which is capacity/distortion) is \( \left[ {2\sum\nolimits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{1} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} } \right]/\left[ {\frac{3}{4}\sum\nolimits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{1} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} } \right] = \frac{8}{3}, \) and the conventional method’s efficiency is \( \left[ {3\sum\nolimits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{1} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} } \right]/\left[ {\frac{3}{2}\sum\nolimits_{{(F_{x1} ,F_{x2} ,F_{x3} ) \in J_{1} }} {w(F_{x1} ,F_{x2} ,F_{x3} )} } \right]\,=\,2\,<\,\frac{8}{3} \) . Similarly, if the coefficient group (F x1, F x2, F x3) belongs to other sets, when the same quantity of data is hidden, the better video quality and hiding efficiency can be achieved by the proposed 3D HS compared with the conventional 3D HS.

Additionally, the presented scheme can be applied in the image or the video RDH algorithms since the difference or the prediction-error histogram is similar to the coefficient histogram. When the presented scheme is used in some media, especially a gray-scale image with 8 storage bits [24], the overflow/underflow problem should be treated. However, this problem need not be considered when the information is embedded into QDCT coefficients of H.264 video [6].

3 The proposed RDH algorithm for MVC video

3.1 Embeddable blocks limiting distortion drift

The original YUV videos captured by cameras should be compressed to decrease the network transmission load. In order to diminish the spatial redundancy of video sequences, parallax prediction, inter-frame prediction, and intra-frame prediction are employed in MVC standard to calculate prediction block. Then the prediction block is subtracted from the original block in the YUV video at MVC encoder, where the residuary block denoted as K R0 undergoes 4 × 4 (or 8 × 8) DCT and quantization as shown in

$$ F = {round}\left[ {(C_{f} K^{R0} C_{f}^{T} ) \otimes (E_{f} /Q)} \right] $$
(18)

where F is a QDCT block with 16 QDCT coefficients numbered by zigzag scan as shown in Fig. 4, \( C_{f} = \left[ {\begin{array}{*{20}l} 1 & 1 & 1 & 1 \\ 2 & 1 & { - 1} & { - 2} \\ 1 & { - 1} & { - 1} & 1 \\ 1 & { - 2} & 2 & { - 1} \\ \end{array} } \right] , { }\quad E_{f} = \left[ {\begin{array}{*{20}l} {a^{2} } & {ab/2} & {a^{2} } & {ab/2} \\ {ab/2} & {b^{2} /4} & {ab/2} & {b^{2} /4} \\ {a^{2} } & {ab/2} & {a^{2} } & {ab/2} \\ {ab/2} & {b^{2} /4} & {ab/2} & {b^{2} /4} \\ \end{array} } \right],\quad \, a = 1/2, \, b = \sqrt {2/5} , \) the matrix \( C_{f}^{T} \) is the transpose of \( C_{f}\), Q is the quantization step size, \( \otimes \) is a mathematical operator, which indicates that each value in the former matrix is multiplied by the value at the corresponding position in the latter matrix.

Fig. 4
figure 4

A 4 × 4 QDCT block with zigzag scan

If one data bit is hidden into one QDCT block F by changing some QDCT coefficients, the QDCT block F will be altered to a marked block denoted by F′, and the deviation (denoted as ∆F) introduced by hiding data is

$$ \Delta F = F^{\prime} - F $$
(19)

In order to reconstruct YUV videos that the users watch on the screen, at the decoder, the prediction block is computed and added to the residual block denoted by K R, which is achieved by lossless decompression (entropy decoding) and lossy decompression (inverse quantization and inverse 4 × 4 (or 8 × 8) DCT) as shown in

$$ K^{\text{R}} = {round}\left[ {C_{d}^{T} \left( {F \otimes E_{d} } \right)C_{d} } \right] $$
(20)

where \( C_{d} = \left[ {\begin{array}{*{20}l} 1 & 1 & 1 & 1 \\ 1 & {1/2} & { - 1/2} & { - 1} \\ 1 & { - 1} & { - 1} & 1 \\ {1/2} & { - 1} & 1 & { - 1/2} \\ \end{array} } \right],\quad E_{d} = \left[ {\begin{array}{*{20}l} {a^{2} } & {ab} & {a^{2} } & {ab} \\ {ab} & {b^{2} } & {ab} & {b^{2} } \\ {a^{2} } & {ab} & {a^{2} } & {ab} \\ {ab} & {b^{2} } & {ab} & {b^{2} } \\ \end{array} } \right]. \)

When a data bit is hidden through modifying some QDCT coefficients of a block, the residual pixel block K R will be turned into a marked pixel block denoted as \( K^{{R^{\prime}}} \), and the mutation denoted by ∆K R is

$$ \Delta K^{R} = K^{{R^{\prime}}} - K^{R} = {round}\left[ {C_{d}^{T} (\Delta F \cdot Q \otimes E_{d} )C_{d} } \right]. $$
(21)

Take the QDCT coefficient F 13 for example to show the distortion caused by embedding information. Suppose we add an integer denoted as r to the value of F 13, that is, the change of the QDCT block for embedding information is \( \Delta F = \left[ {\begin{array}{*{20}l} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & r \\ 0 & 0 & 0 & 0 \\ \end{array} } \right] , { } \) then the change of corresponding pixel block in YUV is \( \Delta K^{R} = \frac{1}{2}{Qabr}\left[ {\begin{array}{*{20}l} \,\,\,\,1 &{ - 2} & \,\,\,\,2 & { - 1} \\ { - 1} & \,\,\,2 & { - 2} & \,\,\,1 \\ { - 1} & \,\,\,2 & { - 2} & \,\,\,1 \\ \,\,\,\,1 & { - 2} & \,\,\,\,2 & { - 1} \\ \end{array} } \right] .\)

It can be seen that the modification of one QDCT coefficient causes the distortion of the whole 4 × 4 transform block in the corresponding YUV video. Similarly, altering one QDCT coefficient in an 8 × 8 transform block will vary the whole 8 × 8 pixel block, whose range is bigger than the range of 4 × 4 block. Thus 4 × 4 transform blocks are chosen for hiding data in this paper.

Correspondingly, it can be inferred that the boundary pixels denoted as c 0c 12, shown in Fig. 5, may be modified by embedding information into some QDCT coefficients of the blocks K u,v−1 (integers u and v are used to denote the position of a block), K u−1,v−1, K u−1,v , and K u−1,v+1. Additionally, if intra-frame prediction is utilized by the current block K u,v , its prediction block will be calculated by the pixels c 0c 12. Consequently, the embedding induced deviation of the blocks K u,v−1, K u−1,v−1, K u−1,v , and K u−1,v+1 will drift to the block K u,v . Otherwise, when the prediction block of the block K u,v is counted by using inter-frame prediction or parallax prediction, i.e., referring another frame as shown in Fig. 6, any modification of adjacent blocks in the same frame will not have an impact on the block K u,v .

Fig. 5
figure 5

Intra-frame prediction mode: a block position, b the predictive direction of 4 × 4 and 8 × 8 luma block and c the predictive direction of 16 × 16 luma block. In mode 2 not shown in the figure, all elements are predicted with the average of upper pixels denoted by H and left pixels denoted as V, i.e., mean (H + V)

Fig. 6
figure 6

Prediction structure of MVC video with two views

A 16 × 16 MB with inter-frame prediction or parallax prediction is denoted as inter-MB. If the current block K u,v is one of the nine 4 × 4 blocks numbered by 0…8, which are not located at the bottom or the rightmost column of the inter-MB as shown in Fig. 7, its adjacent blocks K u,v+1, K u+1,v+1 and K u+1,v will be in the current inter-MB. In addition, its neighboring block K u+1,v−1 may be either in the current inter-MB or one of the blocks numbered by 9, 10, and 11 in the encoded MB at the encoder (or the decoded MB at the decoder). Therefore, these adjacent blocks K u,v+1, K u+1,v+1, K u+1,v , and K u+1,v−1 will not be influenced by the block K u,v , and the blocks numbered 0…8 in an inter-MB could be chosen as embeddable blocks to embed data without causing intra-frame distortion drift.

Fig. 7
figure 7

4 × 4 blocks without intra-frame distortion drift

Besides intra-frame distortion drift, the inter-frame and the parallax distortion drift will also decrease the video quality. As illustrated in Fig. 6, hierarchical B coding is used in the prediction scheme of MVC video with two views. For one group of picture (GOP) with 16 frames, there are eight frames in each view. The horizontal prediction is inter-frame, and the vertical prediction is parallax. I0 frame and P0 frame are pivotal pictures at the highest level. Only intra-frame prediction is used for I0 frame so that it will not be affected by hiding data into other frames, but hiding data into an I0 frame will infect all the P0, B1, B2, B3, and b4 frames in the two GOPs predicted by I0 frame. In contrast, hiding information into P0 or b4 frames in the right view will not cause parallax distortion drift as P0 or b4 frames are not referred by the frames in the left view, where hiding information into b4 frames also will not cause inter-frame distortion drift. In addition, only b4 frames may be affected by hiding data into B3 frames, and six B3 frames in one GOP could afford enough redundancy space. Accordingly, compared with hiding data into I0 frame, better video quality could be obtained by hiding information into P0, B3 or b4 frames, which could be selected by users on demand.

3.2 Embedding procedure

The presented RDH algorithm for MVC video is shown in Fig. 8. The sender entropy decodes the MVC video to choose embeddable blocks from some QDCT inter-MB, where the MBs not selected for hiding data will be entropy encoded directly. The information is hidden into three QDCT coefficients (F x1, F x2, F x3) chosen from each embeddable 4 × 4 block. The marked MVC video will be gotten by entropy encoding each MB after hiding data. Correspondingly, the receiver could entropy decode the marked MVC video to extract the embedded data from the marked QDCT coefficients that could be recovered completely later.

Fig. 8
figure 8

The flowchart of the presented RDH algorithm for MVC video: a information embedding and b information extraction and video recovery

The way of selecting three QDCT coefficients (F x1, F x2, F x3) from a 4 × 4 luminance block is shown in Algorithm 1. Random function is utilized to choose three coefficients from 15 AC coefficients in a block with zigzag scan described in Fig. 4. Figure 9 shows an example of selecting three embeddable coefficients.

figure a
Fig. 9
figure 9

The random selection of three embeddable coefficients

Step 1:

The cursor starts from the first position marked by 1

Step 2:

When 7 is chosen randomly, the embeddable coefficient F x1 is F 7, and the positions of 7 and 1 pointed by the cursor are swapped

Step 3:

Move the cursor forward to point at 2

Step 4:

If 12 is selected at random, the embeddable coefficient F x2 is F 12, and the places of 12 and 2 are swapped

Step 5:

Move the cursor forward to point at 3

Step 6:

When 3 is selected randomly, the embeddable coefficient F x3 is F 3. Therefore, the selected coefficient group (F x1, F x2, F x3) is (F 7, F 12, F 3)

It can be seen that there are 15 ways to choose F x1 from 15 QDCT AC coefficients, 14 ways to select F x2 from the rest 14 QDCT AC coefficients, and 13 ways to select F x3 from the rest 13 QDCT AC coefficients. Accordingly, the optional quantity of selecting (F x1, F x2, F x3) is 15 × 14 × 13 = 2730. When a marked block is found by the third party, the probability for directly guessing the hidden data bit is 1/2730 \( \approx \) 3.66 × 10−4.

It is more difficult to find small area of distortion compared with large area of distortion. Therefore, it is necessary to limit the distortion region in a MB. In the hiding procedure shown in Algorithm 2, an embeddable 4 × 4 block, which could be used to hide data without causing intra-frame distortion drift, is selected randomly from 9 blocks shown in Fig. 7. In this way, only one 4 × 4 block may be modified for hiding data in one MB. In addition, a positive integer denoted as Z is set to generate a random threshold denoted by U so that we can randomly select embeddable blocks according to |F 0| ≥ U. High threshold U will constrain the quantity of embeddable blocks, so the distortion region will be limited. Consequently, the application of arbitrary embeddable positions including blocks and coefficients could be employed to reduce the distortion of statistical histogram and enhance the undetectability of RDH scheme.

figure b

3.3 Extraction and recovery procedures

Algorithm 3 demonstrates the procedure of information extraction and video restoration. The same random seed can be used to generate some same random sequences. Therefore, when the sender and receiver use the same random seed S, the embeddable QDCT blocks and coefficients employed by the sender can be in one-to-one correspondence with the extractable QDCT blocks and AC coefficients utilized by the receiver. Finally, the receiver could extract the embedded data and restore the video fully according to the reverse process of Fig. 3.

In addition, the computational efficiency of the presented RDH algorithm depends on the video frame number denoted by N F and the information length denoted by L I . Therefore, the computational complexity of the presented algorithm can be denoted by O (N F  × L I ).

figure c

4 Experimental results and discussions

The presented algorithm has been effectuated in the H.264 reference software version JM18.4 [37]. The nine video sequences (640 × 480) [38] shown in Fig. 10 act as test samples. Two YUV files are encoded to a MVC video with 233 frames, which include 30 I0 frames, 30 P0 frames and 116 b4 frames. The parameter intra-period was set as 8. The capacity of a video sequence is the mean number of bits hidden in one I0/P0/b4 frame of all the I0/P0/b4 frames in that sequence. The peak signal-to-noise ratio (PSNR) value and the structural similarity (SSIM) value, which are achieved by comparing the marked YUV video with the original YUV video, are the averages of all the frames. The embedding efficiency e is defined as

$$ e = L_{hide} /\sum\limits_{i = 1}^{{i = L_{modi} }} {{Lcha}_{i} } , $$
(22)

where L hide is the number of hidden bits, and L modi is the number of modified bits, and Lcha i is the changed size of a modified coefficient.

Fig. 10
figure 10

Test videos (the size of each frame is 640 × 480). a Akko&Kayo, b Ballroom, c Crowd, d Exit, e Flamenco, f Objects, g Race, h Rena and i Vassar

If the parameter code block pattern of a block is 0, there is no QDCT coefficient hoarded in the block which only contains zero coefficients actually. Therefore, this block could not be modified for hiding information so that large visual distortion can be eliminated. The distribution of changeable QDCT coefficients in blocks with nonzero code block pattern is shown in Table 1. The schemes hiding data into P0 and b4 frame with intra-frame distortion drift are denoted as P0_drift and b4_drift. The schemes embedding information into P0 and b4 frame without intra-frame distortion drift are denoted as P0_interMB and b4_interMB, in which only embeddable blocks 0…8 shown in Fig. 7 are considered. The probability of changeable zero coefficients is denoted as p 0. For P0_drift, p 0 is about 0.925. For P0_interMB, p 0 is about 0.927. For b4_drift, p 0 is about 0.935. For b4_interMB, p 0 is about 0.937. The overwhelming majority of changeable QDCT coefficients are zero, which indicates that the peak of the QDCT coefficient histogram is rather steep. Thus, fine payload-distortion performance can be obtained by using HS to embed data. Li et al.’s two-dimensional HS method [33] could be used for embedding information into QDCT coefficients of MVC video. When the value of coefficient pair (F x1, F x2) meets F x1 F x2 = 0, it is expanded to its neighboring positions (F x1 + 1, F x2 − 1) or (F x1 − 1, F x2 + 1) for signifying one data bit 1, where the cost is 2. In order to hide two data bits 11 into zero coefficients, the cost is 4, whereas the cost is 1 by using our method that takes full advantage of the coefficient distribution.

Table 1 Average numbers of different QDCT coefficient values in one frame of Ballroom

Table 2 shows the embedding performance of four schemes where the threshold U is 0. Different hiding capacities can be achieved by hiding data into different videos. Therefore, different amounts of data are hidden into different videos. The proposed 3D-HS-based RDH method is used for hiding data in these schemes except Ma et al.’s scheme [3], which is denoted by Ma. Compared with Ma, where data is hidden into I0 frame without causing intra-frame distortion drift, P0_drift increases PSNR, SSIM and embedding efficiency e with 0.163 dB, 0.00005, and 1.2 for hiding 500 bits of data in one frame of Crowd, respectively. Compared with P0_drift, PSNR, SSIM and embedding efficiency e can be enhanced with 0.003 dB, 0.00013 and 0.03 by P0 _interMB for Crowd, respectively. Compared with P0_ interMB, PSNR, SSIM and embedding efficiency e can be enhanced with 0.083 dB, 0.00033 and 0.05 by b4_interMB for Crowd, respectively. On the whole, compared with Ma, b4_interMB is superior by enhancing PSNR, SSIM, and embedding efficiency e with 0.156 dB, 0.00008 and 0.71 at least, respectively.

Table 2 Hiding performance of four schemes for hiding data into one frame on average

In Table 3, three QDCT AC coefficients F 2, F 5, and F 3 are used for hiding information. Compared with the conventional 3D HS, the presented 3D HS is better by improving PSNR, SSIM, and embedding efficiency e with 0.02 dB, 0.00004, and 0.41 at least, respectively. Accordingly, the marked frame of Akko&Kayo, Exit and Race are shown in Fig. 11. Figures (a–c) are the marked frame obtained by employing the conventional 3D HS to embed data. The rest figures are the marked frame achieved by using the proposed 3D HS to hide information. Their original frames are shown in Fig. 10. It is easy to find apparent distortion in the frames (a–c). There is a large distortion on the top left corner of frame (a). Many squares can be seen on the back of the man in frame (b). On the top middle of frame (c), the distortion is on the trees. By contrast, little distortion could be seen in the frames (d–f). The experimental results verify that superior visual quality could be obtained by using the proposed 3D HS method to hide information.

Table 3 Embedding performance of the conventional 3D HS and the presented 3D HS for hiding 500 bits of information into the first P0 frame
Fig. 11
figure 11

The marked frames of Akko&Kayo, Exit and Race with conventional 3D HS and the proposed 3D HS

In order to compare the proposed RDH algorithm based on 3D HS with other RDH schemes in the same environment, embeddable blocks, which can be utilized for hiding data without the parallax or the intra-frame distortion diffusion, are chosen from inter-MBs of P0 frame, as shown in Fig. 7. Huang and Chang’s scheme [30] is denoted as Huang. In Huang and our method, information is hidden into three QDCT AC coefficients F 2, F 5, and F 3, where the coefficient pair (F 2, F 5) is used by Shi et al.’s scheme [13] denoted as Shi, and Ou et al.’s scheme [36] denoted as Ou. Additionally, F 2 is used by Chung et al.’s scheme [7] denoted as Chung. At the leftmost point of every line in Fig. 12, data is hidden into the embeddable blocks that meet |F 0| ≥ U, where the threshold U is 0. The five points of each line from left to right express the hiding cases in which the thresholdU is 4, 3, 2, 1 and 0, respectively.

Fig. 12
figure 12

Comparison of hiding performance with other RDH methods on MVC videos

It is obvious that the best PSNR, SSIM, embedding efficiency e, and the least bit-rate increase could be obtained by employing the presented method when the same number of data is embedded. We try to hide two data bits with at most one modification, so the embedding efficiency is improved, i.e., we can hide more information when the same distortion is caused. Consequently, if the same quantity of information is embedded, the less modification will be induced, and the video quality is better, which is verified by higher PSNR and SSIM. Additionally, little cost brings lower bit-rate increase that demonstrates hiding data with our method has little impact on the coding efficiency of H.264.

In Table 4, the threshold U is 0, and 700 bits of information are embedded into each P0 frame on average. Compared with the other schemes, the presented method is superior by increasing PSNR and SSIM with 0.006 dB and 0.00005 at least, respectively. The best SSIM and PSNR mean that the best video quality can be gained by utilizing our algorithm. Furthermore, the embedding efficiency e of the presented scheme is much higher compared with that of other schemes, which demonstrates that when the same quantity of cover bits are changed, more bits of information could be embedded by our algorithm compared with other schemes.

Table 4 Embedding performance for hiding 700 bits of information into one P0 frame on average

5 Conclusions

An efficient 3D HS is presented for RDH algorithm in this paper. This new reversible method could be used for hiding data into image and video, where the image RDH such as 3D difference HS and 3D prediction-error HS will be applied and verified in our future work. We use the proposed 3D HS algorithm to embed data into QDCT coefficients of MVC video in this paper. Three coefficients chosen from each embeddable block are used for hiding two bits of information, where just one coefficient may be changed by adding 1 at most in most cases. Superior payload-distortion performance could be achieved by the proposed scheme compared with some state-of–the-art RDH methods. In order to improve the hiding capacity and decrease the distortion, the presented method will be generalized to hide over two data bits with at most one modification in future.