Keywords

1 Introduction

Since H.264/AVC is the most widely-used video compression standard, the necessity of practical security and privacy preservation for H.264/AVC is unquestionable [1]. To address this problem, H.264/AVC video stream should be distributed over an open network in encrypted format. In some application scenarios, it is necessary to embed some additional message, such as authentication data or media notation, within the encrypted video even though the original video content is unrevealed. With the additional message, the server can manage the video or verify its integrity without having the knowledge of the original content, and thus the privacy is protected. The capability of performing data hiding directly in encrypted H.264/AVC video streams would avoid the leakage of video content, which can help address the security and privacy concerns, especially for cloud computing [2].

However, it is a big challenge to embed hidden data into these encrypted video streams as the compression process would remove redundant information and the encryption process would randomize the compressed bit stream. Till now, few schemes have been appeared in the open literature. In [3], during H.264/AVC compression, the intra-prediction mode (IPM), motion vector difference (MVD) and the first 8 coefficients’ signs in each 4×4 block are encrypted, while DCT coefficients’ amplitudes are watermarked adaptively. Another commutative watermarking and encryption scheme for MPEG-2 video is proposed in [4], the DCs in intra macroblocks are encrypted or watermarked based on random module addition, while the DCs in other macroblocks and all the ACs’ signs are encrypted with a stream cipher or block cipher. However, data embedding is not reversible within the aforementioned schemes. This is critical drawback in some applications. For example, the cloud service provider has no right to introduce permanent distortion due to data hiding in encrypted video. To solve this problem, reversible data hiding technique that completely preserves the video quality in encrypted domain is preferred.

Several reversible data hiding in encrypted image are investigated in [58]. In [5], Zhang divided the encrypted image into blocks, and each block carries one bit by flipping 3 Least Significant Bits of each encrypted pixel in a set. Hong et al. [6] improved Zhang’s method [5], by adopting new smooth evaluation function and side-match mechanism to decrease the error rate of extracted bits. Later, Zhang further proposed a separable reversible data hiding in encrypted image [7]. In [8], Ma et al. proposed a reversible data hiding in encrypted image by reserving room before encryption. Unfortunately, these algorithms in encrypted image cannot be applied to H.264/AVC stream due to the impact of the coding. In [9], a combined scheme of encryption and watermarking to provide the access right and the authentication of the video content simultaneously is presented. Similarly, the IPMs of 4 × 4 luminance block, the sign bits of texture, and the sign bits of MVDs are encrypted, while IPM is used for reversible watermarking. However, the marked stream is not fully compatible with the specification of H.264 standards, such that a standard decoder which cannot parse a watermarked stream may crash.

In this paper, we propose an algorithm to reversibly embed secret data directly in encrypted H.264/AVC stream. The encrypted H.264/AVC streams are obtained by encrypting the IPM, MVD, and DCT coefficients’ signs with a stream cipher. Reversible data hiding in encrypted domain is then performed based on histogram-shifting of residue coefficients. The rest of the paper is organized as follows. In Sect. 2, we describe the proposed scheme. Experimental results are presented in Sect. 3. Finally in Sect. 4, conclusion and discussion are drawn.

2 Proposed Scheme

In this section, a reversible data hiding method in encrypted version of H.264/AVC videos is illustrated, which is made up of video encryption, data embedding, data extraction and video recovery phases.

2.1 H.264/AVC Video Encryption

A. IPM Encryption

Intra-prediction modes in H.264/AVC encoder include Intra_4 × 4, Intra_16 × 16, Intra_chroma and I_PCM [10]. In H.264/AVC, the IPM of Intra_16 × 16 is specified in the mb_type field which also specifies other parameters about this block such as coded block pattern (CBP). Table 1 is the list of mb_type values with their meanings which taken from the standard [11]. In H.264/AVC baseline profile, the mb_type is encoded with the Exp-Golomb code. In order to keep the encrypted streaming format compliance to the H.264/AVC standard decoder, we can encrypt the IPM without modifying the CBP. From Table 1, it can be seen that the combination of CBP is the same in every four lines, and the codeword has the same length respectively in every two lines. Thus for Intra_16 × 16 block, the IPM encryption can be performed by XORing the last bit of the codewords with pseudo-random bits to keep the value of CBP and the length of codeword unchanged which may result in a compliant bitstream. Pseudo-random bits are determined by an encryption key E_Key1 using a standard stream cipher (e.g., RC4).

Table 1. Macroblock types for I slices and variable length of codeword

In H.264/AVC, Intra_4 × 4 luminance block has 9 prediction modes, namely mode 0- mode 8. To efficiently compress the prediction mode bits, predictive coding is used to signal Intra_4 × 4 prediction modes [10]. For each current block E, the most probable mode (MPM E ) is estimated from the spatially adjacent upper and left blocks. If the prediction mode of current block is equal to MPM E , only one bit is needed to signal the prediction mode. In this case, the IPMs are kept unchanged. Otherwise, eight values are required (0 to 7) to signal the prediction mode. The codeword is composed of one sign bit “0” and three bits fixed-length code. The IPM encryption is performed by XORing three bits fixed-length codeword B with pseudo-random bits which are determined by an encryption key E_Key2 using a standard stream cipher as shown in Fig. 1. Obviously, the length of the bit string corresponding to the encrypted IPM is the same as that corresponding to the original IPM. In summary, IPM encryption implies changing the actual mode to another one without violating the semantics and bitstream compliance.

Fig. 1.
figure 1

IPM encryption process for Intra_4 × 4 block

B. MVD Encryption

In H.264/AVC baseline profile, Exp-Golomb entropy coding is used to encode MVD. The codeword of Exp-Golomb is constructed as [Mzeros][1][INFO] [10]. M_zeros and INFO are an M leading zero bits before the first bit 1 and the following M-bit field carrying information, respectively. The length of each Exp-Golomb codeword is (2M − 1). Each codeword can be constructed by the encoder based on its index code_num:

$$ \left\{ {\begin{array}{*{20}c} {M = floor\left( {\log_{2} \left( {code\_num + 1} \right)} \right)} \\ {INFO = code\_num + 1 - 2^{M} } \\ \end{array} } \right. $$
(1)

The MVD k to be encoded is mapped to code_num as follows:

$$ code\_num = \left\{ {\begin{array}{*{20}c} {2\left| k \right|} & {k \le 0} \\ {2\left| k \right| - 1} & {k > 0} \\ \end{array} } \right. $$
(2)

Table 2 shows MVD and corresponding Exp-Golomb codeword. In order to avoid the bit-overhead and satisfy the format compliance, only sign bits of MVDs are encrypted using a standard stream cipher with an encryption key E_Key3. For example, the codewords corresponding to “2” and “−2” are “00100” and “00101”, respectively, which have the same length.

Table 2. MVD and corresponding Exp-Golomb codeword

C. Residue Encryption

In H.264/AVC baseline profile, Context-Adaptive Variable Length Coding (CAVLC) is used to encode the quantized transform coefficients of a residual block. The signs (“0”—positive, “1”--negative) of non-zero coefficients are encrypted using a standard stream cipher with an encryption key E_Key3. In this method, the luminance residue coefficients are encrypted, while the chroma residue coefficients keep unchanged. The bit-overhead may be generated after an encryption due to the CAVLC entropy coding. However, the bit rate of encrypted formats is very close to that of unencrypted formats. This will be confirmed in the experimental results.

2.2 Data Embedding in Encrypted Video

Once the data hider receives the encrypted video, he can embed some information into it for the purpose of media notation or integrity authentication. Since only the coefficients’ signs are encrypted, a reversible data hiding method based on histogram-shifting of residue coefficients is proposed.

A. Embedding Zone Selection

The embedding space can be created by modifying the histogram of the residual coefficients. First, find the two highest bins in the histogram (excluding zero coefficients) denoted by T p and T n , respectively. For simplicity, we select T p  = 1 and T n  =− 1.

In the histogram shifting approach, all coefficients lying between the range of peak and zero points are shifted with one level. When the number of modified coefficients is increased, the quality of host video will be affected. Since the DC and low frequency coefficients contain most of the energy and embedding data in these may affect the video quality and the bit-rate significantly, in order to imperceptibly embed a certain amount of message, we choose mid- and high-frequency DCT residual coefficients to embed data. Thus the number of modified coefficients is decreased since only those coefficients located in the mid- and high-frequency are modified. All of the transform coefficients within the 4 × 4 block can be marked in a sequential order, from 0, 1, …, up to 15 in the zig-zag order. Suppose the embedding region is R = [T 1T 2], where T 1 and T 2 are two sequential numbers along the zig-zag scan order, i.e., 0 ≤ T 1 < T 2 ≤ 15. That is, the transform coefficients whose values are between [T 1T 2] are modified to embed data while those outside the range are left unchanged. After some experimental investigation, we set T 2 = 15. Since coefficients vary widely for different blocks, the threshold T 1 is dynamically adopted according to the DC value of the current 4 × 4 block, which is denoted as DC cur . Of course, threshold T 1 may still be improved to achieve optimization.

$$ T_{1} = \left\{ {\begin{array}{*{20}l} 3 \hfill & {if\;DC_{cur} < 1} \hfill \\ 4 \hfill & {if\;1 \le DC_{cur} \le 5} \hfill \\ 5 \hfill & {if\;6 \le DC_{cur} \le 10} \hfill \\ 6 \hfill & {if\;11 \le DC_{cur} \le 20} \hfill \\ 7 \hfill & {if\;DC_{cur} \ge 21} \hfill \\ \end{array} } \right. $$
(3)

Data embedding is performed in those 4 × 4 blocks in P-frames only when the DC or the first AC coefficients are nonzero, since I-frames are crucial for video signal. Our simulation results demonstrate that we can embed the additional data with a large capacity in P-frames while preserving high visual quality.

B. Data Embedding

Many reversible data hiding methods have been proposed, such as the methods based on difference expansion and histogram shifting. Histogram shifting is one of the most popular methods which modify the histogram in such a way that certain bins are shifted to create vacant space while some other bins are utilized to carry data by filling the vacant space. Before data embedding, partial decoding is performed on the encrypted stream to obtain the residue coefficients, which denote as f i,j (k). Here, f i,j (k) is encrypted residue, that is, the data hider does not have the key to decrypt and get the plain values. Data hiding is performed directly in encrypted domain by modifying the encrypted coefficients based on histogram shifting. The stego encrypted coefficients are then replaced back and all the coefficients are CAVLC coded.

Different from the traditional reversible data hiding method [12], in this method, the bins 1 and −1 are utilized for data embedding and other bins (except bin 0) for shifting. Suppose the to-be-embedded message is a binary signal denoted as W = {w l |l = 1, 2, ··· , Mw l  ∈ {0, 1}}. Our data hiding algorithm in the encrypted domain can be described as follows.

Case 1: If f i,j (k) > 1 or f i,j (k) < − 1, shift f i,j (k) by 1 units to the right and left, respectively. That is,

$$ \overline{{f_{i,j} }} (k) = \left\{ {\begin{array}{*{20}l} {f_{i,j} (k) - 1,} \hfill & {if\;f_{i,j} (k) < - 1} \hfill \\ {f_{i,j} (k) + 1,} \hfill & {if\;f_{i,j} (k) > 1} \hfill \\ \end{array} } \right. $$
(4)

where f i,j (k) is the k-th original quantized DCT coefficient in the j-th block of the i-th macroblock, \( \overline{{f_{i,j} }} \left( k \right) \) is the corresponding coefficients with hidden data.

Case 2: If f i,j (k) = 1 or f i,j (k) = − 1, modify f i,j (k) according to the message bit w l to be embedded.

$$ \overline{{f_{i,j} }} \left( k \right) = \left\{ {\begin{array}{*{20}l} {f_{i,j} (k) - 1,} \hfill & {if\;f_{i,j} (k) = - 1,\;and\;w_{l} = 1} \hfill \\ {f_{i,j} (k)} \hfill & {if\;f_{i,j} (k) = - 1,\;and\;w_{l} = 0} \hfill \\ {f_{i,j} (k) + 1} \hfill & {if\;f_{i,j} (k) = 1,\;\;\;\,and\;w_{l} = 1} \hfill \\ {f_{i,j} (k)} \hfill & {if\;f_{i,j} (k) = 1,\;\;\;\,and\;w_{l} = 0} \hfill \\ \end{array} } \right. $$
(5)

Case 3: If f i,j (k) = 0, \( \overline{{f_{i,j} }} (k) = f_{i,j} (k) \), i.e., f i,j (k) keeps unchanged, i.e., the marked coefficient \( \overline{{f_{i,j} }} \left( k \right) \) is taken as f i,j (k) itself. An example on how to embed data by histogram shifting is illustrated in Fig. 2.

Fig. 2.
figure 2

An example of the histogram-shifting method

Fig. 3.
figure 3

Original video frames

2.3 Data Extraction and Original Video Recovery

In this scheme, the hidden data can be extracted either in encrypted or decrypted domain. Besides, our method is also reversible, where the hidden data could be removed to obtain the original video.

A. Scheme I: Encrypted Domain Extraction

From the stego encrypted video, the embedded data bit w l can be extracted as

$$ \bar{w}_{l} = \left\{ {\begin{array}{*{20}l} 0 \hfill & {if\;\overline{{f_{i,j} }} (k) = \pm 1} \hfill \\ 1 \hfill & {if\;\overline{{f_{i,j} }} (k) = \pm 2} \hfill \\ \end{array} } \right. $$
(6)

The original encrypted residue coefficients f i,j (k) can be recovered as

$$ f_{i,j} (k) = \left\{ {\begin{array}{*{20}c} {\overline{{f_{i,j} }} (k),} & {if\;\overline{{f_{i,j} }} (k) \in \left\{ {0, \pm 1} \right\}} \\ {\overline{{f_{i,j} }} (k) - 1,} & {if\;\overline{{f_{i,j} }} (k)\; \ge \;2} \\ {\overline{{f_{i,j} }} (k) + 1,} & {if\;\overline{{f_{i,j} }} (k)\;\;\; \le \; - 2} \\ \end{array} } \right. $$
(7)

Since the whole process is entirely performed in the encrypted domain, it avoids the leakage of original content. With the encryption keys, the content owner can further decrypt the video to get the original cover video.

B. Scheme II: Decrypted Domain Extraction

In some cases, users want to decrypt the video first and then extract the hidden data from the decrypted video. For example, an authorized user, which owned the encryption key, decrypted the encrypted video with hidden data. The decrypted video still includes the hidden data, which can be used to trace the source of the data. The whole process of decryption, data extraction and video recovery is as follows.

Step1: Generate the stego decrypted video with the encryption keys. Because of the symmetry of the XOR operation, the decryption operation is symmetric to the encryption operation. That is, the encrypted data can be decrypted by performing XOR operation with generated pseudorandom bits, and then two XOR operations cancel each other out, which renders the original plain text. Since the pseudorandom bits depend on the encryption keys, the decryption is possible only for the authorized users.

Step2: The embedded data bit w l can also be extracted from the stego decrypted video using Eq. (6), since the encryption simply changed the signs of the coefficients. For example, the coefficient value “1” may be changed to “−1” during the encryption process, but “1” and “−1” are all correspond to the hidden bit “0”. The difference is that f i,j (k) should be decrypted coefficients.

Step3: The original video can be further recovered via Eq. (7). Similarly, f i,j (k) should be decrypted coefficients.

3 Experimental Results

The proposed data hiding scheme has been implemented in the H.264/AVC JM-12.2 reference software [13]. Four well-known standard video sequences (i.e., Carphone, News, Foreman, and Hall) in QCIF format (176 × 144) at the frame rate 30 frames/s are used for our simulation. The first 100 frames in each video sequence are used in the experiments. The GOP (Group of Pictures) structure is “IPPPP: one I frame followed four P frames (QP: I 28, P 28)”.

A. Security of the encryption algorithm

For a video encryption scheme, the security depends on cryptographic security and perceptual security. Cryptographic security denotes the security against cryptographic attacks, which depends on the cipher. In the proposed scheme, the secure stream cipher is used to encrypt the bitstream and the secret data, which has been proved its security against attack. Perceptual security refers to the encrypted video is unintelligible. The proposed scheme encrypts IPM, MVD and residue coefficients, which keeps the encrypted video unintelligible. The encryption results are shown in Fig. 4. In general, scrambling performance of the described encryption system is more than adequate.

Fig. 4.
figure 4

Encrypted video frames

B. Stego Video Quality

An original frame from each video is shown in Fig. 3. The decrypted video frames with hidden data are given in Fig. 5. The degradation of the decoded video quality should be maintained at an acceptable range, whether or not the hidden data is removed. Since the embedding scheme is reversible, the original cover content can be perfectly recovered after extraction of the hidden data. At the same time, in the experiments, no visible artifacts can be observed in all of the decrypted videos containing hidden data.

Fig. 5.
figure 5

Decrypted video frames with hidden data

Besides subjective observation, PSNR (Peak Signal to Noise Ratio), SSIM (Structural Similarity Index), and VQM (Video Quality Measurement) are adopted to evaluate the perceptual quality [14, 15]. The SSIM index lies in the range between 0 and 1, where 0 indicates zero correlation, i.e. the reference image is entirely different than the target, and 1 indicates that they are identical. Since H.264/AVC is lossy compression, in order to better illustrate the data hiding on the video quality, the visual quality of non-stego video stream should be tested. Video sequence after compression and decompression process is used as the target, while the original uncompressed video sequence is used as the reference video clip. Similarly, in order to test the visual quality of stego video stream, video sequence after compression, encryption, data hiding, decryption, and decompression process is used as the target. That is, in this case, the target video containing the hidden data. VQM presents another approach in video quality measuring that correlates more with the Human Visual System (HVS). In general, the lower VQM value indicates higher perceptual video quality, and zero indicates excellent quality. The experimental results are shown in Table 3. We can see that there is only a very small change in PSNR, SSIM and VQM values. It is almost impossible to detect the degradation in video quality caused by data hiding.

Table 3. Embedding capacity, PSNR, SSIM and VQM in directly decrypted videos

C. Embedding Capacity

The maximum payload capacity in each video encoded with different QP (quantization parameter) values is given in Table 3. Payload of the proposed scheme depends on type of video content and the QP values. The reason for this is that each video stream has a different number of qualified coefficients. For example, when QP value equals to 24, a large number of quantized coefficients can be used for data hiding and hence payload is high. While QP value equals to 32, the payload is lower, since quantized coefficients which can be used for data hiding are fewer. Higher payload can be attained if data hiding is allowed in I-frames, but it will have more effect on visual quality of video.

D. Bit Rate Variation

To further evaluate the performance of the proposed scheme, bit rate variation caused by encryption and data hiding is also introduced [16].

$$ BR\_\text{var} = \frac{BR\_em - BR\_orig}{BR\_orig} \times 100\% $$
(8)

where BR_em is the bit rate generated by encryption and data embedding encoder, and BR_orig is the bit rate generated by the original encoder. The results of BR_var are also depicted in Table 3. Changing zero-quantized residuals to nonzero values in course of embedding can significantly increase the video bit rate when coefficients are encoded using CAVLC codes. However, zero-quantized residuals are not modified in our scheme, so bit rate is increased, but still below 2.89 %. This effect can be almost ignored.

4 Conclusion and Discussion

In this paper, an algorithm to reversibly embed secret data in encrypted H.264/AVC streams is presented, which consists of video encryption, data embedding and data extraction three phases. The IPM, MVD, and the sign bits of residue coefficients are encrypted using a standard stream cipher without violating format compliance. The data-hider can embed the secret data into the encrypted video using the traditional histogram-shifting method, even though he does not know the original video content. Data extraction is separable from the video decryption. In other words, the additional data can be extracted either in encrypted domain or decrypted domain. Furthermore, this algorithm can achieve real reversibility, and high quality of marked decrypted videos. One of the possible applications of this method is video annotation in cloud computing where high video quality and reversibility are greatly desired.

Although reversible data hiding and cryptography have reached a certain degree of maturity, but reversible data hiding in encrypted domain is a highly interdisciplinary area of research. Technical research in this field has only just begun, and there is still an open space for research in this interdisciplinary research area.