Keywords

1 Introduction

In recent years, with the rapid development of broadband Internet services and practical video compression techniques, high-quality digital video can be conveniently downloaded and broadcast through social networking services such as YouTube, Facebook and Weibo. Therefore, digital video has become one of the most influential media in the entertainment industry. With the increasing popularity of digital video, copyright protection for video grows in importance and receives a lot of attention. Information hiding is one of the possible solutions for protecting copyrighted video from illegal copying and distribution.

Information hiding refers to the process of inserting information into a host to achieve certain features or serve specific purposes. Information hiding in video can be divided into two main categories according to the domain where data embedding takes place. One category is spatial domain-based information hiding. Under this embedding scheme, raw pixel values are directly modified using those conventional information hiding methods designed for digital image, such as Least Significant Bit (LSB) Matching [1] and Spread Spectrum (SS) [2, 3]. However, researchers refrain from taking these approaches due to the inevitable loss of hidden data caused by video compression. The other category is usually referred to as compressed domain-based information hiding. Methods of this type manipulate entities of compressed video to realize data embedding. Commonly used entities include motion vectors (MVs) [4,5,6,7,8], intra prediction modes [9,10,11], inter prediction modes [12, 13], quantized DCT coefficients (DCTCs) [14,15,16], quantization parameters [17, 18], reference frame indices [19] and variable length codes [20, 21].

Prior to H.264/AVC [22], there are numerous information hiding methods proposed for MPEG-1/2 standards. However, the data carriers (hiding venues) are restricted to DCTCs and MVs, mainly because other entities of MPEG-1/2 compressed video are not quite suitable for information hiding purposes. With the advent of the H.264/AVC standard in 2003, information hiding in video has then received much attention due to the introduction of various new technical features in H.264/AVC, which provides plenty of opportunities for information hiding. For instance, in Hu et al.’s work [9], \(4\times 4\) intra prediction modes (I4-modes) were exploited for data embedding, which is based on a predefined mapping rule between I4-modes and hiding bits. Kapotas and Skodras [12] proposed forcing the encoder to choose a particular inter prediction mode according to the information to be embedded. In [21], a specific type of codeword in CAVLC (Context-based Adaptive Variable Length Coding) was employed as the data carrier to realize information hiding.

In the H.264/AVC standard, the coded block pattern (CBP) is a syntax element contained in the Macroblock LayerFootnote 1, and indicates which luma and chroma blocks within a macroblock contain non-zero transform coefficients. To the best of our knowledge, the CBP is never utilized as the data carrier for information hiding, contrasting with other entities of H.264/AVC compressed video. Nevertheless, the CBP has the potential to be exploited for data embedding, and the reasons are listed as follows. First, CBPs are losslessly coded to form an integral part of compressed bitstreams. Thus it is reasonably practicable to employ the CBP as the data carrier for information hiding. Second, for CBP-based data embedding, any crucial process in video compression would not be influenced except for entropy coding. Consequently, only a slight reduction in compression efficiency and visual quality would be caused. Last but not least, compressed video usually contains a substantial quantity of CBPs, which ensures the high embedding capacities that CBP-based data hiding methods can have.

In this paper, we propose a novel data hiding method for H.264/AVC which exploits the CBP as the data carrier. In the proposed embedding scheme, the four LSBs of each CBP are mapped to a single bit to form the embedding channel. In addition, an embedding distortion function is specifically designed to evaluate the impacts of CBP manipulation on visual quality and compression efficiency. With the suitably defined distortion function utilized, the Syndrome-Trellis Codes (STC) [23] are further exploited to minimize the overall embedding impact. The experimental results show that the proposed method is capable of performing data hiding at the expense of a fairly slight reduction in visual quality, even when large payload is embedded. More importantly, the problem of bitrate overhead caused by data embedding could be considerably alleviated, enabling an efficient use of transmission and storage resources. As a result, the proposed method is suitable to be used in practice.

The rest of this paper is organized as follows. In Sect. 2, the basic concepts of H.264/AVC syntax and the CBP are introduced. In Sect. 3, the embedding distortion caused by modification of CBPs is analyzed, and our CBP-based data hiding method is presented. In addition, the practical implementation of the proposed method is discussed. In Sect. 4, comparative experiments are conducted to demonstrate the effectiveness of our method. Finally we draw conclusions in Sect. 5.

2 The Coded Block Pattern in H.264/AVC

H.264/AVC syntax is defined in the H.264/AVC standard and specifies the exact structure of an H.264/AVC-compliant bitstream in terms of syntax elements.

Fig. 1.
figure 1

The basic structure of the Macroblock Layer in the H.264/AVC standard.

According to H.264/AVC syntax specifications, the CBP is a syntax element for macroblocks that are not coded using \(16\times 16\) intra prediction modes. Contained in the Macroblock Layer (Fig. 1), the CBP indicates which luma and chroma blocks in a macroblock contain non-zero transform coefficients. In addition, the CBP can be expressed as a 6-bit binary number, and can take values between 0 and \(47\).

Fig. 2.
figure 2

The relationship between the CBP and the corresponding transform coefficients. \(b_i\ (i=0,1,2,3)\) indicates whether there are non-zero coefficients present in \(\text {L}_i\), one of the four \(8\times 8\) luma blocks. \(b_5\) and \(b_4\) closely correlate with the presence of non-zero chroma transform coefficients.

As shown in Fig. 2, given a CBP represented by \(\overline{b_5b_4b_3b_2b_1b_0}\) where \(b_i\ (i=0,1,\dots ,5)\) denotes a binary digit, it is constructed as follows:

  1. 1.

    Each of the four LSBs of the CBP, i.e. \(b_i\ (i=0,1,2,3)\), indicates whether there are one or more non-zero transform coefficients within the corresponding \(8\times 8\) luma block (denoted by \(\text {L}_i\) in Fig. 2). Specifically, the value of \(b_i\) is given by

    $$\begin{aligned} b_i = {\left\{ \begin{array}{ll} 1\qquad \text {non-zero coefficient(s) present}\\ 0\qquad \text {no non-zero coefficients present}\end{array}\right. } \end{aligned}$$
    (1)

    where \(i=0,1,2,3\).

  2. 2.

    The two MSBs (Most Significant Bit) of the CBP, \(b_5\) and \(b_4\), are associated with chroma transform coefficients, and can take the values specified by

    $$\begin{aligned} \overline{b_5b_4} = {\left\{ \begin{array}{ll} 00_2\qquad \text {no non-zero chroma coefficients present}\\ \\ 01_2\qquad \begin{aligned}&{}\text {non-zero chroma DC coefficient(s) present,}\\ &{}\text {no non-zero chroma AC coefficients present}\end{aligned}\\ \\ 10_2\qquad \text {non-zero chroma AC coefficient(s) present}\end{array}\right. }. \end{aligned}$$
    (2)

Residual data for a macroblock is transmitted according to the corresponding CBP. If the CBP indicates a block contains no non-zero coefficients, that block would be bypassed. In addition, since the CBP is losslessly entropy coded and inserted into H.264/AVC bitstreams, the decoding of the residual data for a macroblock could be facilitated by referring to the corresponding CBP.

3 The Proposed Method

As introduced in Sect. 2, the CBP indicates which luma and chroma blocks in a macroblock contain non-zero transform coefficients. In addition, residual data for a macroblock is transmitted according to the corresponding CBP.

In this section, we propose a data hiding method for H.264/AVC which utilizes the CBP as the data carrier. An embedding distortion function is designed to evaluate the reduction in visual quality and compression efficiency caused by modification of CBPs. In addition, combined with the suitably defined distortion function, the STC [23] is adopted in the embedding channel construction to minimize the overall embedding impact. Consequently, the proposed method can embed the payload with very limited embedding distortion introduced.

3.1 The Embedding Channel Construction

Given the cover with \(N\) CBPs denoted by \(\mathbf {C} = (C_1, C_2, \dots , C_N)\), the embedding channel construction is carried out by exploiting the STC as follows.

Each CBP \(C_i\ (i=1,2,\dots ,N)\) is mapped to a single bit according to the parity check function given by

$$\begin{aligned} P(C) = \oplus _{i=0}^3\ c_i \end{aligned}$$
(3)

where \(c_i\) denotes one of the four LSBs of \(C\) and satisfies \(\overline{c_3c_2c_1c_0}=C\%\) Footnote 2 16. The symbol \(\oplus \) represents the bitwise exclusive OR operation. Based on the parity check function \(P\), the embedding channel \(\mathbf {p}=(p_1,p_2,\dots ,p_N)\) can then be obtained, where \(p_i = P(C_i)\ (i=1,2,\dots ,N)\).

Assume that modifications of CBPs are mutually independent and let \(p_i\ (i=1,2,\dots ,N)\) be assigned a positive scalar \(\gamma _i\) expressing the impact of making an embedding change at the corresponding CBP \(C_i\), the STC is adopted to minimize the overall embedding impact \(D(\mathbf {p},\mathbf {p}')=\sum _{i=1}^{N}\gamma _i[p_i\ne p'_i]\) Footnote 3, where \(\mathbf {p'}=(p'_1,p'_2,\dots ,p'_N)\) denotes the embedding channel with hidden data. Given an \(\alpha N\)-bit message \(\mathbf {m}\) to be embedded, where \(\alpha \) denotes the embedding payload, the STC-based data embedding and extraction can be formulated as

$$\begin{aligned} \text {Emb}_{\text {STC}}(\mathbf {p},\mathbf {\Gamma },\mathbf {m})= & {} \arg \min _{\mathbf {p}'\in \mathcal {C}(\mathbf {m})}D(\mathbf {p},\mathbf {p}')=\tilde{\mathbf {p}},\end{aligned}$$
(4)
$$\begin{aligned} \text {Ext}_{\text {STC}}(\tilde{\mathbf {p}})= & {} \tilde{\mathbf {p}}\mathbf {H}^T_{\text {STC}}=\mathbf {m}. \end{aligned}$$
(5)

Here, \(\mathbf {\Gamma }=(\gamma _1,\gamma _2,\dots ,\gamma _N)\) denotes the distortion scalar vector, \(\mathcal {C}(\mathbf {m})\) represents the coset corresponding to the syndrome \(\mathbf {m}\), and \(\mathbf {H}_{\text {STC}}\in \{0,1\}^{\alpha N\times N}\) is the parity check matrix of the STC which should be shared between communication parties. For detailed information about the STC, please refer to [23].

The design of embedding distortion is an essential part of the distortion minimization framework, and could directly affect the performance of the embedding algorithm. Given a CBP \(C\) and one of its four LSBs \(c\), the embedding distortion caused by modification of \(c\) is analyzed as follows. If \(c\) is modified from 1 to 0, as mentioned in Sect. 2, those non-zero transform coefficients within the corresponding \(8\times 8\) luma block (Fig. 2) would be bypassed, leading to a decrease in bitrates but a reduction in reconstructed visual quality. On the other hand, if \(c\) is modified from 0 to 1, although reconstructed visual quality could be maintained, bitrate overhead would be caused inevitably because 64 extra zero coefficients have to be entropy coded and transmitted. In addition, the number of bits required for representing \(C\) could also be affected if \(c\) is modified. Consequently, both visual quality and compression efficiency could be influenced by modification of \(c\). The distortion caused by modification of \(c\) (denoted by \(\varPsi (C,c)\)) can then be defined in terms of the fluctuation in visual quality and bitrates, i.e.

$$\begin{aligned} \varPsi (C,c)=|\mathcal {J}(\mathbf {S}, \mathbf {S}_{\text {rec}}, R)-\mathcal {J}(\mathbf {S}, \mathbf {S}_{\text {rec}}',R')|. \end{aligned}$$
(6)

Here, \(\mathbf {S}\) denotes the original macroblock associated with \(C\), \(\mathbf {S}_\text {rec}\) and \(\mathbf {S}'_\text {rec}\) represent the corresponding reconstructed (decoded) macroblock with \(c\) unaltered and that with \(c\) modified, respectively. \(R\) denotes the number of bits required for coding \(\mathbf {S}\) when \(c\) is unaltered, and \(R'\) modified. \(\mathcal {J}\) denotes the Lagrangian cost function given by

$$\begin{aligned} \mathcal {J}(\mathbf {A},\mathbf {B},R)={D}(\mathbf {A}, \mathbf {B}) + \lambda R \end{aligned}$$
(7)

where \(D\) represents the visual distortion measured as SSD (Sum of Squared Differences), and \(\lambda \) is the Lagrange multiplier determined by the employed H.264 encoder that is used for controlling the tradeoff between \(D\) and \(R\).

According to the analysis above, for a CBP \(C\), the corresponding embedding distortion scalar \(\gamma \) can be intuitively defined as

$$\begin{aligned} \gamma ={\left\{ \begin{array}{ll} \min {\{\varPsi (C,c_i)\;|\;i=0,1,2,3\}}\quad &{}\mathbf {S}\,\, \text {is inter-coded.}\\ \infty &{}\text {else}\end{array}\right. } \end{aligned}$$
(8)

where \(\mathbf {S}\) denotes the macroblock associated with \(C\). \(c_i\ (i=0,1,2,3)\) represents one of the four LSBs of \(C\), and satisfies \(\overline{c_3c_2c_1c_0}=C\,\%16\).

3.2 Practical Implementation

In practice, almost all available H.264/AVC CODECs can be customized to implement the proposed method. Typically, I-frames are not used for the data embedding, and secret message bits are embedded into P- or B-frames in a frame-by-frame manner. Without loss of generality, the processes of embedding and extraction with one single P- or B-frame are described as follows.

  1. 1.

    Data Embedding

    First, the raw frame is subdivided into macroblocks to be fed into the customized encoder. For each macroblock not coded using \(16\times 16\) intra-prediction modes, record its associated CBP determined by the encoder, and calculate the corresponding distortion scalar according to (8). Afterwards, the cover with \(N\) CBPs \(\mathbf {C}=(C_1,C_2,\dots ,C_N)\) and the distortion scalar vector \(\mathbf {\Gamma }=(\gamma _1,\gamma _2,\dots ,\gamma _N)\) are obtained.

    Second, the embedding channel \(\mathbf {p}=(p_1,p_2,\dots ,p_N)\) is constructed according to the parity check function (3). As described in (4), the STC is subsequently exploited to embed an \(\alpha N\)-bit message \(\mathbf {m}\) by turning \(\mathbf {p}\) into \(\tilde{\mathbf {p}}\), where \(\text {Emb}_{\text {STC}}(\mathbf {p},\mathbf {\Gamma },\mathbf {m})=\tilde{\mathbf {p}}=(\tilde{p_1},\tilde{p_2},\dots ,\tilde{p_N})\).

    Third, the raw frame is fed into the encoder again. Each CBP \(C_i\) in \(\mathbf {C}\) is then subject to the possible modification controlled by \(\tilde{p_i}\). Specifically, if \(\tilde{p_i}=p_i\), \(C_i\) is left unaltered. In case \(\tilde{p_i}\ne p_i\), \(C_i\) is modified by flipping one of its four LSBs \(\tilde{c}\) that satisfies

    $$\begin{aligned} \tilde{c} = \arg \min _{c\in \mathcal {X}}{\varPsi (C_i,c)}. \end{aligned}$$
    (9)

    where \(\mathcal {X}\) denotes the set of the four LSBs of \(C_i\).

    Finally, the compressed frame with an \(\alpha N\)-bit message embedded is inserted into the compressed bitstream.

  2. 2.

    Data Extraction

    Compared with the embedding process, data extraction is much easier. Received by the recipient, the compressed frame can be decoded by any H.264 decoder to parse a total of \(N\) CBPs from the Macroblock Layer (Fig. 1). With \(\tilde{\mathbf {p}}\) reconstructed, the message can be extracted by computing \(\mathbf {m}=\tilde{\mathbf {p}}\mathbf {H}^T_{\text {STC}}\).

4 Experiments

In this section, experiments are conducted to demonstrate that the proposed method is capable of embedding the payload with very limited embedding distortion introduced.

Fig. 3.
figure 3

The raw sequences used for performance evaluation.

4.1 Experimental Setup

In the experiments, x264 [24], a highly practical H.264 encoder, is customized to prepare the compressed video samples, with only the Baseline Profile adopted. For each compressed video sample to be prepared, the quantization parameter (QP) is set as 20 or 28. In addition to the proposed method, Kapotas and Skodras [12] and Aly’s [6] methods are also implemented for comparison, and are referred to as ALG1 and ALG2 respectively. Since these three methods are not applicable to I-frames, the embedding strength is measured by average embedded bits per inter-coded frame (bpf), and is fixed at 190 bpf in the experiments. As shown in Fig. 3, 14 CIF-resolution raw test sequences are used in the experiments to evaluate the impacts of data embedding on visual quality and compression efficiency. All of them are progressively scanned and with YUV 4:2:0 color sampling.

Table 1. The impacts of data embedding on visual quality and compression efficiency under different experimental settings. \(\text {RC}_\text {PSNR}\) defined in (10) is used for evaluating the impacts on visual quality, and \(\text {RC}_\text {BR}\) given by (11) is employed for evaluating the impacts on compression efficiency. (NF (the Number of Frames), IM (Impact Measure)).

4.2 Impacts on Visual Quality

PSNR (Peak Signal-to-Noise Ratio) is a commonly used objective measurement of picture quality. In YUV domain, human visual system is more sensitive to luminance (Y) than to chrominance (U or V). Thus, the picture quality in the experiments is measured by PSNR to luminance (\(\text {PSNR}_\text {Y}\)). Accordingly, given a stego video sequence \(V_\text {s}\) and the corresponding cover \(V_\text {c}\), the impacts of data embedding on visual quality can be evaluated by computing the relative change between the average \(\text {PSNR}_\text {Y}\) values of \(V_\text {s}\) and \(V_\text {c}\), i.e.

$$\begin{aligned} \text {RC}_\text {PSNR}(V_\text {s}, V_\text {c}) = \frac{\text {PSNR}_\text {Y}(V_\text {s})-\text {PSNR}_\text {Y}(V_\text {c})}{\text {PSNR}_\text {Y}(V_\text {c})}\times 100 \end{aligned}$$
(10)

where \(\text {PSNR}_\text {Y}(V)\) denotes the average \(\text {PSNR}_\text {Y}\) value of the sequence \(V\).

It can be observed from Table 1 that, our proposed method always has an adverse impact on visual quality. The reasons are explained as follows. As described in Sect. 3, the modification of a given CBP is performed by flipping one of its four LSBs. If that bit is modified from 1 to 0, those non-zero residual coefficients within the corresponding \(8\times 8\) luma block would be bypassed, which could easily result in a reduction in visual quality. If that bit is modified from 0 to 1, the 64 zero coefficients in the corresponding \(8\times 8\) luma block need to be entropy coded and transmitted. In this case, only compression efficiency would be negatively influenced. Therefore, when the embedding payload is large enough, a reduction in visual quality would always be caused by our proposed method.

Nevertheless, for our proposed method, visual quality is regarded as a crucial factor in evaluating the embedding distortion. In addition, by virtue of the STC, a practical distortion minimization framework, our proposed method could embed the payload with very limited embedding distortion introduced. Thus, although the impacts on visual quality could always be made by our method, it could be considerably alleviated and reduced to a fairly low level. As shown in Table 1, for the proposed method, the minimum value of \(\text {RC}_{\text {PSNR}}\) is given by −0.8790 for \(\text {QP}=20\), and −0.5677 for \(\text {QP}=20\).

As can be seen in Table 1, compared with our method, both ALG1 and ALG2 seem to cause a minor reduction in visual quality. However, this is achieved at the expense of significant bitrate overhead, leading to poor rate-distortion performance in general.

Fig. 4.
figure 4

Impacts on visual quality are evaluated in a frame-by-frame manner for sequence “stefan”, with QP set as 20 (left) and 28 (right).

As shown in Fig. 4, we take a closer look at the specific sequence named “stefan”, and \(\text {RC}_\text {PSNR}\) is computed for each frame. It can be observed that, for the proposed method, although a reduction in visual quality is caused, it can always be maintained at a low level. In contrast, with the increase in QP, the impact of data embedding on visual quality becomes more obvious and severe for ALG1 and ALG2.

4.3 Impacts on Compression Efficiency

Compression efficiency, also known as “compression capability”, is the most fundamental driving force behind the adoption of modern video compression techniques. Since compression efficiency is a crucial factor in assessing the performance of video encoders, the impacts of data embedding on compression efficiency could suggest whether or not the corresponding data hiding method is suitable to be used in situations where limited transmission and storage resources are available.

With compression efficiency represented as bitrates, given a a stego video sample \(V_\text {s}\) and the corresponding cover \(V_\text {c}\), the impacts of data embedding on compression efficiency can then be evaluated by computing the relative change between the bitrates of \(V_\text {s}\) and the \(V_\text {c}\), i.e.

$$\begin{aligned} \text {RC}_\text {BR} = \frac{\text {BR}(V_\text {s})-\text {BR}(V_\text {c})}{\text {BR}(V_\text {c})}\times 100 \end{aligned}$$
(11)

where \(\text {BR}(V)\) denotes the bitrate of the sequence \(V\).

As described in Sect. 3, a distortion function is defined in terms of the fluctuation in visual quality and compression efficiency. In addition, by using the STC, our method can embed the payload while minimizing that suitably defined distortion function. Therefore, compared with traditional video data hiding methods, the proposed approach can perform data embedding with minor fluctuation in bitrates, even when large payload (190 bpf) is embedded.

As can be seen in Table 1, for our proposed method, the problem of bitrate overhead is considerably alleviated, enabling more efficient use of transmission and storage resources. Therefore, our method is capable of maintaining compression efficiency at the cost of a fairly slight reduction in visual quality. As expected, both ALG1 and ALG2 are subject to marked bitrate overhead, leading to reduced compression efficiency.

Fig. 5.
figure 5

Impacts on compression efficiency are evaluated in a frame-by-frame manner for sequence “stefan”, with QP set as 20 (left) and 28 (right).

Given a single frame \(F_{\text {s}}\) of a stego video sequence, the impacts on compression efficiency could be evaluated as

$$\begin{aligned} \text {RC}_\text {frmSize} = \frac{\text {size}(F_\text {s})-\text {size}(F_\text {c})}{\text {size}(F_\text {c})} \end{aligned}$$
(12)

where \(F_\text {c}\) denotes the frame of the cover video sequence which corresponds to \(F_\text {s}\), and \(\text {size}(F)\) represents the size of the frame measured in bytes.

As shown in Fig. 5, our proposed method can always provide well maintained compression efficiency, but, by contrast, significant bitrate overhead could be caused by ALG1 and ALG2. Thus, compared with ALG1 and ALG2, our method is capable of performing data embedding with rate-distortion efficiency well maintained, even when large payload is embedded.

5 Conclusion

In this paper, we propose a novel data hiding method which utilizes the CBP as the data carrier. An reasonable embedding distortion is designed by considering the reduction in visual quality and compression efficiency caused by CBP manipulation. Combined with the distortion function, the STC is employed to minimize the overall embedding impact. Thus, the proposed method is capable of conducting data embedding with limited distortion introduced.

The experimental results demonstrate that, compared with traditional video data hiding methods, the proposed approach is capable of embedding large payload with only a slight reduction in visual quality produced. In addition, the problem of bitrate overhead could be considerably alleviated, thus well maintaining the compression efficiency. Therefore, our method enables more efficient use of transmission and storage resources, and is suitable to be used in practice.