1 Introduction

The widespread use of the Internet and World Wide Web has changed the way digital data is handled. The easy access of images, musical documents and movies has modified the development of data hiding, by placing emphasis on copyright protection, content-based authentication, tamper proofing, annotation and covert communication. Data hiding deals with the ability of embedding data into a digital cover with a minimum amount of perceivable degradation, i.e., the embedded data is invisible or inaudible to a human observer. Data hiding consists of two sets of data, namely the cover medium and the embedding data, which is called the message. The cover medium and the message can be text, audio, picture or video depending on the size of the message and the capacity of the cover.

Data hiding and watermarking techniques are usually studied together because a watermarking technique can serve as a data hiding technique, as well, although the opposite is not always feasible. Early video data hiding approaches were essentially still image watermarking techniques extended to video by hiding the message in each frame independently [1]. Methods such as spread spectrum were used, where the basic idea is to distribute the message over a wide range of frequencies of the host data. Transform domain is generally preferred for hiding data since, for the same robustness as for the spatial domain, the result is more pleasant to the Human Visual System (HVS). For this purpose the Discrete Fourier Transform (DFT), the Discrete Cosine Transform (DCT), and the Discrete Wavelet Transform (DWT) domains were usually employed [24]. Recent video data hiding techniques are focused on the characteristics generated by video compressing standards. Motion vector based schemes have been proposed for MPEG algorithms [57]. Motion vectors are calculated by the video encoder in order to remove the temporal redundancies between frames. In these methods the original motion vector is replaced by another locally optimal motion vector to embed data. Only few data hiding algorithms considering the properties of H.264 standard [813] have recently appeared in the open literature. In [8] a subset of the 4 × 4 DCT coefficients are modified in order to achieve a robust watermarking algorithm for H.264. In [9] the blind algorithm for copyright protection is based on the intra prediction mode of the H.264 video coding standard. In [10] some of the skipped macroblocks are used to embed data. The algorithm in [11] hides one bit in each qualified intra 4 × 4 luma block by modifying the intra 4 × 4 prediction modes. In [12] data hiding is based on the block size selection during the inter prediction process of the H.264 encoding. In [13] one bit of information is embedded in the sign bit of the trailing ones in Context Adaptive Variable Length Coding (CAVLC) of the H.264/AVC stream.

Traditionally, most of the data hiding techniques take place during the encoding either in the spatial or in the transform domain. Thus, for these techniques it is extremely difficult to perform data hiding in real time. Moreover, the access to the original video sequence is not always feasible and reuse of the marked video to hide different data is necessary. First decoding of the bitstream and then re-encoding it to embed the new data is needed. However, this results in video quality degradation, since the data hiding process has always some negative impact to the PSNR. Moreover, reusing the marked video, by decoding and re-encoding it, cannot always take place in real time. Other techniques, applied in the compressed domain [5, 8, 11], may produce artifacts and drift errors, which degrade the video quality. Such errors make the reusing of the marked video in the compressed domain very ineffective. The proposed method addresses the aforementioned problems, i.e. data hiding in real time and reusing the marked video in real time without affecting video quality.

In general, the well-established H.264/AVC video coding standard [14] has adopted many advanced features, which can be exploited for data hiding purposes. In this paper we propose a new data hiding scheme, which exploits the IPCM mode used by the H.264 encoder during the intra prediction, in order to hide the desired data. The data can then be extracted directly from the encoded stream without using the original host video. The whole process is depicted in Fig. 1. This method is best suited to content-based authentication and covert communication applications.

Fig. 1
figure 1

The process of embedding and extracting data using the H.264 codec

The rest of the paper is organized as follows. In Sect. 2 we briefly present the Intra Prediction within the H.264/AVC encoder [15]. In Sect. 3 we describe the new method and in Sect. 4 the message extractor. Simulation results are presented in Sect. 5. Some considerations for further improvements are given in Sect. 6. Finally, in Sect. 7 conclusions are drawn.

2 Intra mode prediction in H.264/AVC

The H.264/AVC standard supports coded sequences containing I and P slices. I slices contain intra coded macroblocks in which each 16 × 16 (I_16 × 16) or 4 × 4 (I_4 × 4) luma region and each 8 × 8 (I_8 × 8) chroma region is predicted from previously-coded samples in the same slice. A third type of Intra coding, called IPCM, is also provided for use in unusual situations. The encoder typically selects the prediction mode for each block that minimizes the difference between the predicted block and the block to be encoded.

The I_4 × 4 mode is based on predicting each 4 × 4 luma block separately and is well suited for coding parts of a picture with significant detail. The I_16 × 16 mode, on the other hand, performs prediction and residual coding on the entire 16 × 16 luma block and is more suited for coding very smooth areas of a picture. In addition to these two types of luma prediction, a separate chroma prediction is conducted. In contrast to previous video coding standards (especially H.263 + and MPEG-4 Visual), where intra prediction has been conducted in the transform domain, intra prediction in H.264/AVC is always conducted in the spatial domain, by referring to neighboring samples of previously-decoded blocks that are on the left and/or above the block to be predicted. Since this can result in spatio-temporal error propagation when inter prediction has been used for neighboring macroblocks, a constrained intra coding mode can alternatively be selected that allows prediction only from intra-coded neighboring macroblocks. In I_4 × 4 mode, each 4 × 4 luma block is predicted from spatially neighboring samples.

When the fidelity of the coded video is high (i.e., when the quantization step size is very small), it is possible in certain very rare instances of input picture content for the encoding process to actually cause data expansion rather than compression. Furthermore, it is convenient for implementation reasons, to have a reasonably low identifiable limit on the number of bits necessary to process in a decoder in order to decode a single macroblock. To address these issues, the standard includes an IPCM macroblock mode, in which the values of the samples are sent directly—without prediction, transformation, or quantization. An additional motivation for support of this macroblock mode is to allow regions of the picture to be represented without any loss of fidelity. However, the IPCM mode is clearly not efficient—indeed it is not intended to be efficient—rather, it is intended to be simple and to impose a minimum upper bound on the number of bits that can be used to represent a macroblock with sufficient accuracy. If one considers the bits necessary to indicate which mode has been selected for the macroblock, the use of the IPCM mode actually results in a minor degree of data expansion.

3 The proposed data hiding approach

As mentioned in Sect. 2, ‘IPCM’ is a macroblock in which the values of the samples are sent directly—without prediction, transformation, or quantization. The concept behind our method is to hide the desired data in the low bits of both the luma and the chroma samples of an IPCM macroblock. Eventually, the hidden data will be embedded into the compressed H.264 stream intact. Simple and straightforward though, the proposed method has to face two practical obstacles: the rareness of the IPCM macroblocks during the encoding and the low efficiency of the IPCM mode in terms of compression. The latter turns out to be a trade off issue between the generated bitrate and the payload of the hidden data. This issue is discussed in Sect. 5 with the help of some experimental results. Regarding the rareness of the IPCM macroblock, we conducted several tests with many well-known video sequences. All of the tests resulted in none IPCM macroblocks no matter how low the quantization parameter was set. We therefore concluded that the only safe way to produce IPCM macroblocks during the encoding is to force the encoder to regard specific macroblocks as IPCM macroblocks. The simplified block diagram of the proposed method integrated within the H.264 encoder is shown in Fig. 2.

Fig. 2
figure 2

Simplified block diagram of the proposed method integrated within the H.264 encoder

According to the proposed method, two new steps are added in the H.264 encoder: the IPCM Decision and the Data Embedding.

  • IPCM Decision intervenes in the Mode Decision process of the H.264 encoding and forces certain macroblocks to be encoded as IPCM macroblocks. The decision, on which macroblocks are going to be encoded as IPCM, depends on the length of the data to be hidden. The rule is that the IPCM macroblocks must be enough to cover the hidden data and must be spread to the possible extend within the H.264 stream. In the rare situation that the encoder decides to encode a macroblock as IPCM, without our intervention, the Data Embedding step will also use this macroblock to hide data.

  • Data Embedding takes action after the IPCM Decision and modifies the low bits of the values of the aforementioned macroblocks in such a way that the modified bits form the hidden data. The “tweaked” IPCM macroblocks will then undergo the lossless entropy encoding and the hidden data will eventually be inserted intact into the generated H.264 stream.

The proposed method is characterized by three main features, namely, ease of implementation, respectable data capacity, and reusability. The latter allows data hiding in real time directly in the compressed domain. All these features are described below.

3.1 Ease of implementation

The proposed method can be easily integrated within the reference H.264 encoder [15]. It takes place in a very early stage of the encoding process, before any spatial or temporal predictions and before the transformation and the quantization. Therefore, the impact of the proposed method to the encoding process is minimized. Some implementation hints on how to implement the proposed algorithm using the reference H.264 encoder version JM14.0 [15] are given below:

  1. 1.

    Add the following code just before the “compute_mode_RD_cost” function is called:

    if(img->current_mb_nr==N && img->type==P_SLICE)

    for (i = 0; i < 11; i++) enc_mb.valid[i] = 0;

    This code will force the encoder to encode the Nth macroblock of every P slice in IPCM mode.

  2. 2.

    Modify the low bits of the values of this macroblock. This is done under the “IPCM” case inside the “RDCost_for_macroblocks” function.

The two steps above correspond to the IPCM Decision and the Data Embedding blocks, respectively.

3.2 Data capacity

The data capacity of a video sequence (YUV 4:2:0) is calculated in accordance with Eq. 1:

$$ {\text{Data capacity}} = {\text{Luma capacity}} + 2 \times {\text{Chroma capacity}} $$
(1)

where

$$ \begin{aligned} {\text{Luma capacity}} & = 256 \times N_{\text{IPCM}} \times {\text{L}}_{\text{bits}} , {\text{ and}} \\ {\text{Chroma capacity}} & = 64 \times N_{\text{IPCM}} \times {\text{C}}_{\text{bits}} \\ \end{aligned} $$

N IPCM is the number of the IPCM macroblocks used for data hiding, Lbits is the number of the low bits per IPCM luma sample used for data hiding, and Cbits is the number of the low bits per IPCM chroma sample used for data hiding. The luma and chroma samples are 256 and 64, respectively.

According to [16], up to three low bits of an 8-bit sample can be modified without causing any visual distortion. However, our experiments showed that even if the four low bits are modified the distortion is imperceptible. This is explained by the fact that we are not dealing with static images but with moving frames at a rate of 30 fps. Moreover, we embed no more than one IPCM macroblock per frame and not in successive frames. Finally, the rest of the non-IPCM macroblocks have possibly suffered greater distortion due to the intra/inter prediction and to the quantization during the encoding. In order to prove the above we zeroed the four low bits of every luma and chroma sample of the 49th macroblock of the 9th frame of the mobile sequence. The sequence was encoded (QP = 28, CABAC) and decoded using the H.264 JM.14.0 codec. Figure 3 shows the visual result.

Fig. 3
figure 3

Comparison of the visual results between the original and the marked 9th frame of mobile

An interesting approach would be the modifiable bits Lbits and Cbitsto be mathematically related to the Quantization Parameter (QP). For example for a high QP (>28) we could modify four bits while for a lower QP we could modify three bits or fewer. Other combinations are also applicable such as the use of three bits for the luma blocks and four bits for the chroma blocks. In the current implementation of the proposed method we modify the four low bits of both of the IPCM luma and the chroma samples in order to hide the data. Hence, from Eq. 1, a single IPCM macroblock (N IPCM = 1) for Lbits = Cbits = 4 gives a capacity of 1,536 bits. This might be regarded as the upper limit of the capacity per IPCM macroblock.

3.3 Reusability and real time data hiding

Most of the data hiding methods hide the data during the encoding process, thus they are slow and ineffective for real time applications such as covert mobile communication. The proposed method, as explained above, encodes some macroblocks in IPCM mode and hides the message within their data. After the first pass and as long as the IPCM macroblocks have been encoded, the same IPCM macroblocks can be reused to hide new data, directly in the compressed domain, numerous times in real time. The reusing process needs neither the original video sequence nor the original encoded stream. Furthermore, it is proved that it does not cause any considerable PSNR or bit rate distortions. The IPCM macroblock reuse is described below.

The H.264 bitstream is organized in slices, each containing a number of coded macroblocks. Figure 4 shows the simplified syntax of a coded slice. All of the Headers and the Payloads are entropy encoded. The MB Header contains, among others, the prediction modes of the macroblock. The MB Payload usually contains the coded transform coefficients corresponding to the residual image samples after prediction. However, in case of an IPCM macroblock, the Payload contains the pixel values.

Fig. 4
figure 4

Simplified syntax of a coded slice

The reusing process is performed in four steps, as follows:

  1. 1.

    Get a slice unit from the H.264 stream.

  2. 2.

    Entropy decode each macroblock’s header to determine if this is an IPCM macroblock.

  3. 3.

    In case of an IPCM macroblock:

    1. a.

      entropy decode the macroblock payload in order to get the pixel values,

    2. b.

      hide the new data into the low bits of the pixels,

    3. c.

      entropy re-encode the macroblock,

    4. d.

      store the macroblock back to the slice unit.

  4. 4.

    Go to step 1.

The real time data hiding is achieved by the fact that the method needs only to entropy decode and re-encode the compressed IPCM macroblocks, thus avoiding the time-consuming normal encoding process. Figure 5 shows the block diagram of the real time data hiding process.

Fig. 5
figure 5

Block diagram of the real time data hiding process

4 Message extractor

The message extractor is a software tool, not necessarily an H.264 decoder, which extracts the hidden message from the marked H.264 bitstream. The message extraction process is as follows:

  1. 1.

    Get a slice unit from the H.264 stream.

  2. 2.

    Entropy decode each macroblock’s header to determine if this is an IPCM macroblock.

  3. 3.

    In case of an IPCM macroblock:

    1. a.

      entropy decode the macroblock payload in order to get the pixel values,

    2. b.

      read the low bits of the pixels in order to extract the message.

  4. 4.

    Go to step 1.

The message extractor works in a way similar to that part of the algorithm that reuses the IPCM macroblocks for data hiding. This gives it the desirable property of being of very low complexity and thus appropriate for real time applications, as it is required in most cases of video [8].

5 Simulation results

The proposed algorithm was integrated within version JM14.0 of the reference H.264 software [15]. The most important configuration parameters of the reference software are given in Table 1. The rest of the parameters have retained their default values.

Table 1 Configuration parameters of the encoder

The IPCM macroblocks are expected to have a negative impact to the produced bit rate. We conducted several tests in order to investigate this impact. For that purpose we used 300 frames or 10 s of well-known representative video sequences in QCIF format (YUV 4:2:0) such as the akiyo (Class A), the foreman (Class B) and the mobile (Class C). The QCIF format (176 × 144) was chosen because it is very common in mobile applications where the demand for real time is always high. The three classes that we used for our experiments possess the following characteristics:

  • Class A: Low spatial detail and low amount of movement;

  • Class B: Medium spatial detail and low amount of movement or vice versa;

  • Class C: High spatial detail and medium amount of movement or vice versa.

The hidden message was generated by the pseudorandom integer generator function, rand, which is provided by the standard C library. The testing procedure was to run the reference encoder with and without our algorithm and then compare the results with respect to bit rate and PSNR. We used the bit rate and the PSNR variation as comparative metrics, which were calculated as in Eqs. 2 and 3, respectively:

$$ {\text{VAR}}_{\text{RATE}} \frac{{R^{\prime} - R}}{R} \times 100\left( \% \right) $$
(2)

where R′ is the bit rate generated by the modified encoder and R is the bit rate generated by the reference encoder, and

$$ {\text{VAR}}_{\text{PSNR}} = \frac{{{\text{PSNR}^\prime}_{Y} + {\text{PSNR}^\prime}_{U} + {\text{PSNR}^\prime}_{V} - {\text{PSNR}}_{Y} - {\text{PSNR}}_{U} - {\text{PSNR}}_{V} }}{3} \times 100(\% ) $$
(3)

where PSNR′ is the total PSNR of the luma (Y) and chroma (U,V) samples generated by the modified encoder and PSNR is the total PSNR of the luma and chroma samples generated by the reference encoder.

All of the values of the R′, R, PSNR′ and the PSNR were read from the log.dat file, which is created by the encoder’s intrinsic logging mechanism. At the first series of tests we ran the encoder for different very common Quantization Parameters (10, 20 and 30) and for different message payloads (3,072–18,432 bits or 10.2–61.4 bits per QCIF frame). Moreover, we did not apply any bit rate constraints. In this way the PSNR remained practically unaffected with the exception of the QP = 10 test case, where the PSNR showed some degradation of less than −0.1 dB. The results are shown in Figs. 6, 7 and 8.

Fig. 6
figure 6

Bit rate (a) and PSNR (b) variations versus message size for different QP parameter values for the Akiyo sequence (300 frames)

Fig. 7
figure 7

Bit rate (a) and PSNR (b) variations versus message size for different QP parameter values for the Foreman sequence (300 frames)

Fig. 8
figure 8

Bit rate (a) and PSNR (b) variations versus message size for different QP parameter values for the Mobile sequence (300 frames)

From the results we see that the bit rate is increased proportionally to the data capacity and to the quantization parameter. This is expected because the higher the quantization parameter is, the lower the produced bit rate under the standard reference encoding. On the other hand, our method increases the bit rate proportionally to the IPCM macroblocks, i.e. to the data payload. For QP ≤ 20 the method presents a rather uniform behavior with a bit rate variation from −0.1 to ~4.0% even for the high payloads. However, for QP ≥ 20 the method results in a continuous bit rate increase. It is notable that akiyo sequence presents much higher bit rate variations than the other two sequences. This is due to the fact that akiyo is a class “A” sequence, i.e. it has low spatial details and low amount of movement. This means that both of the intra and the inter prediction leave small residuals during the normal encoding by the reference encoder, which eventually results in a very low bit rate. On the other hand, our modified encoder always produces the required IPCM macroblocks, which slightly increase the bit rate. Therefore, we conclude that for class “A” sequences and for high QPs (>20) the method cannot efficiently hide more than 5,000 bits in 10 s of video, i.e. an average of 16.7 bits per QCIF frame.

In the second series of tests we enabled the bit rate control mechanism of the encoder and we set 60 Kbps and 50 Kbps bit rate constraints on the encoder, which are considered to be low bit rates for the today’s Internet standards. Our purpose was to investigate the performance of our method when the marked H.264 bitstream has to be transmitted over a channel with limited bandwidth. By enabling the bit rate control the quantization parameters were automatically controlled by the encoder, which had to generate a bit rate lower or equal to the bit rate constraint. In this way the bit rate was practically unaffected as shown in Table 2. Apparently, the cost of doing so was put on the PSNR. Figure 9 shows the PSNR variations when the bit rate constraint is enabled.

Table 2 Bit rate variations under bit rate control
Fig. 9
figure 9

PSNR variation vs message size for different bit rates for the video sequences (a) Akiyo, (b) Foreman, and (c) Mobile (300 frames)

From these results it is seen that the overall performance becomes smoother when the bit rate control is enabled. PSNR decreases proportionally to the message payload. The maximum reduction does not exceed 0.43 and 0.44 dB for the 60 and the 50 Kbps constraints, respectively, at a payload of 18,432 bits for the 300 QCIF frames, i.e. an average of 61.4 bits per QCIF frame. These reductions are observed for the akiyo sequence, as expected. The result is regarded as an acceptable trade off, taking into account the low bit rate constraints and the small number of frames used (i.e. 300).

In the third series of tests we compared our method with the method proposed by Hu et al. [11]. Four QCIF sequences were used (Bridge Close, Grandma, News, and Silent of 199 frames each). The tests were performed using H.264 Main Profile configuration (with RDO, CABAC, QP = 28, and 30 frames/s) and a GOP structure of “IBPBPBPBPB”. Our method was integrated within the H.264 reference encoder JM14.0 while the competitive method had been integrated in the H.264 reference encoder JM11.0. The results are shown in Table 3. The PMC denotes the maximum payload for the proposed method while the 11MC denotes the maximum payload for method [11] that was used in the comparisons.

Table 3 Comparison results between the proposed method and that of Hu et al. [11]

Finally, our experiments showed that the proposed algorithm did not introduce serious delays in the encoding process. When the method works in the compressed domain it manages to hide the same amount of data as in the uncompressed domain in real time. This is feasible because the method has to entropy decode a few macroblocks only. Since entropy decoding does not incorporate the inverse transform, it is performed very fast. Moreover, embedding the data by tampering the low bits of the macroblock’s pixels is done using logical operations (AND, OR) and, therefore, it adds a negligible delay to the whole data hiding process. For the above reasons, the proposed method works, in the compressed domain, much faster than the H.264 decoder.

Based on all of the above results, the conclusion is that the proposed method is fast and works better for bit rates around 60 Kbps and higher, where the maximum PSNR degradation does not exceed 0.43 dB for the higher payloads.

6 Further improvements

The proposed approach can be modified in a number of ways in order to enhance its features, namely maximum payload, PSNR, bit rate, and robustness. It is noted that not all of the above can be simultaneously improved.

A potential improvement is in the way the proposed method chooses the IPCM macroblocks. Currently the method chooses the IPCM macroblocks taking into account only the amount of data to be embedded. However, this may result in unjustifiable bit rate expansion. For example the method may choose to force the encoder to encode a specific macroblock as IPCM, while the encoder would have encoded it as a skip one. It would be better for the method to let the encoder perform the motion estimation first. Then, only the macroblocks, which presented non-zero residuals during the motion estimation, would be considered to be candidate IPCM macroblocks. In that way the impact on the bit rate would be smaller. Preliminary simulation tests have shown that there are no considerable variations with the results presented in the previous section. This was expected, as IPCM macroblocks constitute a limited part of the frame. Along the same lines, the message data could be unevenly distributed in the macroblocks, based on each block’s statistics (i.e. roughness), thus further reducing the resulted perceptual degradation. This means that a different number of the lower bits of each IPCM macroblock could be devoted to data hiding. Also, instead of choosing a particular macroblock of each slice P to hide data, this could be chosen in a random way based upon a key.

In general, the fact that the proposed method inserts raw information (IPCM macroblocks with modified image data) into the H.264 bitstream generates a lot of potential improvements. The IPCM macroblock can be regarded as part of a still image. Therefore, many data hiding and watermarking techniques, which work in the spatial domain, can be applied [17]. For example, the message could be first encrypted by means of a key, before being embedded into the macroblocks.

7 Discussion and conclusions

In this paper a new low complexity data hiding scheme for H.264 encoded video sequences has been presented. Embedding takes place during the encoding process and exploits the IPCM coded macroblocks in order to hide the data. The same IPCM macroblocks can be reused to hide new data, directly in the compressed domain, numerous times in real time. The method is blind, meaning that the original host video sequence is not needed for message extraction. The method achieves relatively high data capacities without considerably affecting either the video quality or the coding efficiency. The method is fragile, which means that if the marked video is decoded and re-encoded, the embedded data will be lost. The method could be used in a number of applications such as content authentication, tamper proofing, covert communication, advertisement monitoring, content indexing and archiving, etc.

The fact that the proposed method embeds the data directly in the spatial domain (pixel values) makes it immune to some very common errors, which appear in other competitive data hiding techniques working in the compressed domain, such as artifacts and drift. Artifacts are produced by the data hiding techniques, which use the transform coefficients in order to embed the data. Transform coefficient modification results in signal reconstruction errors with undesirable visual effects, such as discontinuities in the block edges. Drift errors are propagating visual distortions between successive video frames as a result of embedding data in the compressed domain without re-performing motion estimation.

In conclusion, the main advantages of the proposed method are its low complexity and the possibility of using the compressed stream for hiding different data many times, without first decoding and then re-encoding the video sequence. This makes the method appropriate for real-time applications. Simulation results have shown that perceptual quality is preserved without sacrificing coding efficiency.