1 Introduction

Reversible data hiding (RDH) [7, 20, 25] is a multimedia security technique to embed additional data (e.g., authentication information) into a carrier image, and recover the original image after data extraction. This emerging technique has been extensively studied and is widely used in many fields such as image authentication [4, 27] and activity recognition [2, 15,16,17, 19]. However, in many big data scenarios, the image is first encrypted before being uploaded to the server for secure communication. Recently, reversible data hiding in encrypted images (RDHEI) has attracted much attention and has many important applications in medical, military and other fields [18, 21, 32]. For example, medical images of patients are usually encrypted first to protect the privacy of patients and then uploaded to the hospital servers. On the one hand, the server manager may hope to embed some additional information, such as integrity authentication; on the other hand, the original medical images must be recovered without error.

The classic RDH algorithms for bmp images include three major approaches: difference expansion [26], histogram shifting [3] and lossless compression [14]. The lossless-compression-based methods release some place by losslessly compressing the cover image and generally have low embedding capacity. The difference-expansion-based methods usually perform a higher embedding capacity, while keeping the image distortion low. Compared with other methods, the histogram-shifting-based methods have the better capacity-distortion curve and have been extensively investigated [10,11,12].

The classic RDH algorithms for un-encrypted images generally cannot be applied to encrypted images directly. Zhang proposed the first RDHEI algorithm with a new pixel flipping strategy in [29]. The method embedded additional data into an image that is encrypted by stream cipher, and the original image can be recovered via the correlation of pixels. Some improved versions of Zhang’s method were proposed in [1, 5, 13, 24, 34]. Since the embedded data can only be extracted after image decryption, i.e., a receiver having data-hiding key but no content owner key cannot extract the embedded information, the type of algorithms in [1, 5, 13, 24, 29, 34] are referred to as non-separable methods.

To overcome this problem, Zhang proposed a separable reversible data hiding scheme in [30]. In this new scheme, the legal receiver can extract the additional data with the use of a data hiding key directly without the image decryption. Qian et al. embedded a secret message into an encrypted image using a histogram modification and n-nary data hiding scheme [22]. This method improved the embedding capacity and image quality, but the leakage of image histogram reduced the safety of the image. Zhang et al. compressed a part of encrypted data in the cipher-text image using a Low Density Parity Check Code (LDPC) and inserted the compressed data into part of encrypted data [31]. Zheng et al. applied a chaotic sequence to encrypt the image and compressed the least significant bits of pixels using the Hamming distance [33]. Instead of stream encryption, Qian et al. proposed an alternative algorithm suitable for block encrypted images and improved the image security and quality [23].

However, all the previously mentioned RDHEI methods are specifically designed. To apply those numerous RDH methods designed in the plain domain to the encrypted domain, some new image encryption strategies, which can preserve the correlations of the neighboring pixels, have been proposed. Huang et al. proposed a specific image encryption method for data hiding in encrypted images, which includes image block partition, stream encryption, and block permutation [6]. Yin et al. partitioned the cover image into non-overlapping blocks and applied multi-granularity encryption to obtain an encrypted image [28]. In this paper, we propose a new RDHEI framework based on bitplane operations and adaptive embedding. Instead of considering each image pixel as an integer, we transform each pixel into two components based on the bitplane parameter and then adopt adaptive embedding in the proposed paper.

In brief, there are two parts in the framework of the most existing RDHEI methods: image encryption and reversible data hiding. Figure 1a gives the overview of the most existing RDHEI methods. The mainstream RDH methods used in RDHEI methods include difference histogram shifting (DHS) [8] and prediction-error histogram shifting (PEHS) [9]. The overview of the proposed RDHEI method is shown in Fig. 1b. It can be seen that the overall connection between this paper and the existing works is the added “bitplane operations” process. However, the proposed method is only applicable for some specific encryption methods, such as the method in [6] or [28]. The only requirement is that the correlation between the neighboring pixels in partial regions (such as in a block) should be well preserved after encryption. Experimental results show that the proposed method can effectively improve the embedding capacity of these kinds of RDHEI frameworks, although reduce the PSNR value.

Fig. 1
figure 1

The overall contribution of this paper

The rest of this paper is organized as follows. In Section 2, the detailed procedures of the proposed RDHEI scheme are described. Section 3 elaborates the experimental results and the performance comparison. The conclusions of our work are given in Section 4.

2 The proposed framework

The proposed separable RDHEI framework is illustrated in Fig. 2 and includes three parties: the content owner, the data hider and the receiver. First, the content owner encrypts the original image I to produce an encrypted image E. Then, the data hider, without knowing the actual contents of I, transforms E into two new images E h and E l , which respectively consist of the high bitplanes and the low bitplanes. Additional data M is evenly divided into M h and M l . Data M h and M l are embedded into E h and E l respectively, and marked-encrypted sub images \( {E}_h^{\ast } \) and \( {E}_l^{\ast } \) are obtained. After that, the marked-encrypted image E is generated by bitplane combination and sent to the receiver. At the receiver side, there are three options for legal receivers. In Case I, the receiver transforms E into sub image \( {E}_h^{\ast } \) and sub image \( {E}_l^{\ast } \), extracts the data in both sub images, and recovers the original image. In Case II, only the data is extracted. In Case III, an approximate image I is obtained by direct decryption. Note that separable RDHEI means that the data extraction process can be separately carried out before image decryption. From Case II in Fig. 2, the embedded data can be extracted without image decryption in our scheme, and thus our proposed framework is “separable”. As is seen in Fig. 1b, the proposed method can adopt different encryption methods and RDH methods. To simplify the discussion, we select the encryption method in [6] and the DHS method in the discussion below.

Fig. 2
figure 2

Sketch of the proposed RDHEI scheme

2.1 Image encryption

Here, we adopt Huang’s [6] image encryption algorithm. We assume the original image I is an 8-bit gray-scale image. The procedures of image encryption include the following steps.

  1. Step 1:

    Divide the original image I into T non-overlapping blocks {B 1, B 2, …, B T }. The blocks are with the size of m × n. Let I(i, j)(1 ≤ i ≤ T, 1 ≤ j ≤ m × n) denotes one of the pixels in block B i , where i denotes the index of the block, and j denotes the index of the pixel in block B i .

  1. Step 2:

    Generate an encryption key stream E. For each block B i , run the stream cipher E to generate a key stream R i (1 ≤ i ≤ T) with a length of 8 bits. For each pixel I(i, j) in the block B i , perform the bitwise exclusive-or (XOR) operation between I(i, j) and R i as follows,

    $$ E\left(i,j\right)=I\left(i,j\right)\wedge {R}_i $$
    (1)

where E(i, j) represents the encrypted pixel and ∧ represents the bitwise XOR operation.

  1. Step 3:

    Encrypt all blocks by repeating Step 2. Note that the pixels in the same block are encrypted with the same key stream, and different key streams are used in different blocks.

  2. Step 4:

    Permute all the encrypted blocks with permutation key and the encrypted image E is generated. In this step, only the order of the blocks is disrupted, and the order of the pixels within each block is still preserved.

2.2 Data hiding in encrypted image

Let E(i, j)(1 ≤ i ≤ T, 1 ≤ j ≤ m × n) be a pixel in the encrypted image E, where i denotes the block index, and j denotes the pixel index in block B i . Decompose E(i, j) into 8 bits using

$$ E\left(i,j,k\right)=\left\lfloor E\left(i,j\right)/{2}^k\right\rfloor \operatorname {mod}\ 2,\kern1.5em k=0,1,2,\dots 7 $$
(2)

where ⌊⋅⌋ represents the floor function. Instead of considering E(i, j) of 8 bits as an integer within the interval of [0,255], the data hider transforms E(i, j) into E h (i, j) and E l (i, j) as in Eq. (3) and Eq. (4). How to select the value of e (1 ≤ e ≤ 7) will be discussed in Section 3.

$$ {E}_h\left(i,j\right)=\sum \limits_{k=0}^{e-1}E\left(i,j,k+8-e\right)\times {2}^k $$
(3)
$$ {E}_l\left(i,j\right)=\sum \limits_{k=0}^{7-e}E\left(i,j,k\right)\times {2}^k $$
(4)

After processing all pixels in image E by Eqs. (3) and (4), the data hider obtains the sub images E h and E l , which consist of e high bitplanes and 8 − e low bitplanes, respectively. How to conduct the bitplane operations is depicted in Fig. 3.

Fig. 3
figure 3

Illustration of bitplane operations

Next, the sub images are further divided into blocks of the same size as that in image encryption phase. Note that, if e = 1, the data will only be embedded in E l because the sub-image E h has only one bitplane, which is not suitable to embed data. If e = 7, the data will only be embedded in E h because the sub-image E l has only one bitplane, which is not suitable to embed data. Since the data hiding procedure in E l is the same as that in E h , how to embed information into E h will be illustrated here for brevity. The data hiding procedure in sub images E h is as follows.

  1. Step 1:

    Generate the difference histogram. The difference value in each block is computed as

    $$ {D}_h\left(i,j\right)={E}_h\left(i,j\right)-{E}_h\left(i,1\right)\kern1.75em i\in \left[1,T\right],j\in \left[1,m\times n\right],j\ne 1 $$
    (5)

Then, the difference values in all blocks make up a difference vector D with a length of (m × n − 1)T can be obtained. We use H to denote the difference histogram so that H k represents the number of elements in the vector D that are equal to k. Note that E h (i, j) ∈ [0, 2e − 1], D h (i, j) ∈ [1 − 2e, 2e − 1] and k ∈ [1 − 2e, 2e − 1]. Suppose D h (i, j) consists of 2e + 1 − 1 different difference values. Thus, there are 2e + 1 − 1 bins in H, from which the two peak bins (the highest two bins) are chosen. The left peak and right peak bins are represented by L and R respectively. Note that H L and H R represent the number of pixels associated with the peak bins L and R, respectively.

  1. Step 2:

    Select peak bins for embedding. Similar to [6], to avoid the saturation (i.e., the underflow or overflow), the underflow pixels (pixels with value 0) and the overflow pixels (pixels with value 2e − 1) have to be preprocessed by modifying one gray-scale unit, and noted respectively in a left location map F L for avoiding underflow and a right location map F R for avoiding overflow. L L and L R denote respectively the length of F L and the length of F R . To embed the data in Step 3, L L is set to the number of pixels with value 0 or 1 in E h (i, j), and L R is set to the number of pixels with value 2e − 1 or 2e − 2 in E h (i, j). Now, the embedding location can be determined as follows.

  2. (1)

    If H L  − L L  > 0, the left peak bin L will be used for data embedding; otherwise, it will not.

  3. (2)

    If H R  − L R  > 0, the right peak bin R will be used for data embedding; otherwise, it will not.

  4. Step 3:

    Embed data adaptively.

  5. (1)

    When the left peak bin L is used for data embedding: visit all the pixels in E h sequentially and append a bit “0” to F L when E h (i, j) = 1. If E h (i, j) = 0, append a bit “1” to F L and make E h (i, j) from 0 to 1 simultaneously. The reversible data hiding is conducted as follows.

    $$ {E}_h^{\ast}\left(i,j\right)=\left\{\begin{array}{l}{E}_h\left(i,j\right)-1\kern2.75em if\ {D}_h\left(i,j\right)<L\\ {}{E}_h\left(i,j\right)-b\kern2.5em if\ {D}_h\left(i,j\right)=L\\ {}{E}_h\left(i,j\right)\kern4em if\ {D}_h\left(i,j\right)>L\end{array}\right. $$
    (6)

where b ∈ {0, 1} represents a bit in the additional data M h , which is to be embedded.

  1. (2)

    When R is used for data embedding: visit all the pixels in E h sequentially and append a bit “0” to F R when E h (i, j) = 2e − 2. If E h (i, j) = 2e − 1, append a bit “1” to F R and change E h (i, j) from 2e − 1 to 2e − 2 simultaneously. The reversible data hiding is conducted as follows.

    $$ {E}_h^{\ast}\left(i,j\right)=\left\{\begin{array}{l}{E}_h\left(i,j\right)\kern4.75em if\ {D}_h\left(i,j\right)<R\\ {}{E}_h\left(i,j\right)+b\kern3.5em if\ {D}_h\left(i,j\right)=R\\ {}{E}_h\left(i,j\right)+1\kern3.75em if\ {D}_h\left(i,j\right)>R\end{array}\right. $$
    (7)

By a similar method, the sub image E l is converted to \( {E}_l^{\ast } \) and the marked-encrypted sub images \( {E}_h^{\ast } \) and \( {E}_l^{\ast } \) can be obtained. Finally, the marked-encrypted image E is generated by bitplane combination which is depicted in Fig. 2 and sent to the receiver.

2.3 Data extraction and image recovery

With the marked-encrypted image E , there are three cases where the receiver has the following: (1) both the data hiding key and the encryption key; (2) only the data hiding key; (3) only the encryption key. After receiving the marked-encrypted image E , the sub images \( {E}_h^{\ast } \) and \( {E}_l^{\ast } \) can be generated. We only introduce the data extraction and image recovery procedure in \( {E}_h^{\ast } \) here for brevity.

  • Case I: both the data hiding key and the encryption key. The receiver first transforms E into \( {E}_h^{\ast } \) and \( {E}_l^{\ast } \), then extracts the additional data by

$$ b=\left\{\begin{array}{l}0\kern3.75em if\ {E}_h^{\ast}\left(i,j\right)-{E}_h^{\ast}\left(i,1\right)=L\kern0.5em \mathrm{or}\kern0.5em R\\ {}1\kern3.75em if\ {E}_h^{\ast}\left(i,j\right)-{E}_h^{\ast}\left(i,1\right)=L-1\kern0.5em \mathrm{or}\kern0.5em R+1\end{array}\right. $$
(8)

The recovery operations are carried out by processing all the pixels in \( {E}_h^{\ast } \) using

$$ {E}_h\left(i,j\right)=\left\{\begin{array}{l}{E}_h^{\ast}\left(i,j\right)+1\kern3.25em if\ {E}_h^{\ast}\left(i,j\right)-{E}_h^{\ast}\left(i,1\right)\le L-1\\ {}{E}_h^{\ast}\left(i,j\right)\kern3.75em if\ {E}_h^{\ast}\left(i,j\right)-{E}_h^{\ast}\left(i,1\right)=L\kern0.5em or\kern0.5em R\\ {}{E}_h^{\ast}\left(i,j\right)\hbox{-} 1\kern3.25em if\ {E}_h^{\ast}\left(i,j\right)-{E}_h^{\ast}\left(i,1\right)\ge R+1\end{array}\right. $$
(9)

where E h (i, j) represents the recovery pixel, and j ∈ [2, m × n], \( {E}_h\left(i,1\right)={E}_h^{\ast}\left(i,1\right) \). Next, the preprocessing pixels are recovered via using the location maps F L and F R :

$$ {E}_h\left(i,j\right)=\left\{\begin{array}{l}0\kern5.25em if\ {E}_h^{\ast}\left(i,j\right)=1\\ {}{2}^e-1\kern3.75em if\ {E}_h^{\ast}\left(i,j\right)={2}^e-2\end{array}\right. $$
(10)

Note that E l can be recovered in a similar way and the original image I can be recovered after bitplane combination and image decryption.

  • Case II: only the data hiding key. With the data hiding key, the receiver can extract the data by applying Eqs. (8) and Eqs. (10), but the original image cannot be obtained.

  • Case III: only the encryption key. An approximate image I is acquired by direct decryption using decryption key. The receiver can roughly get the original image information, but cannot obtain the embedded data.

2.4 Capacity analysis

According to Eqs. (5)–(7), there is a connection between the embedding capacity and the difference value of adjacent pixels. The smaller the difference value D h (i, j) in Eqs. (5), the greater the value H L and H R , and the greater the embedding capacity. As is shown in Fig. 3, instead of considering each image pixel as an integer, the proposed algorithm transforms the original image into a sub-image with high bitplanes and a sub-image with low bitplanes, respectively. In general, the difference value of the adjacent pixels in a sub-image with high bitplanes is smaller than the one in original image. Due to the bitplane operations and adaptive embedding, the proposed algorithm can better explore the correlation between neighbor pixels. Higher correlation between neighbor pixels means a lower difference value of adjacent pixels and higher embedding capacity. Thus, the proposed algorithm achieves a higher embedding capacity.

3 Experimental results

In this section, we conduct several experiments to evaluate the proposed framework. Twelve standard test images (all images are downloaded from the USC-SIPI database (http://sipi.usc.edu/database)) with size 512 × 512 are shown in Fig. 4. Without loss of generality, we adopt a 3 × 3 mode as the block size and random bit stream as the additional data in all our experiments. First, the changes of the classic image “Lena” in different phases corresponding to Section 2 are shown in Fig. 5, in which (a) and (b) are the original image and its encrypted version. Fig. 5c shows the marked encrypted images containing additional data. With the marked image, the receiver owning the data hiding key can extract the additional data. If the receiver has both the encryption key and the data hiding key, the recovered image given in Fig. 5d can be perfectly obtained, which is the same as (a). If the receiver only has the encryption key, he or she can obtain an approximate image. The directly decrypted images with different parameters are exhibited in Fig. 5e–j. It is clear that the larger the parameter value e is, the better image quality the approximate image is. This is because the larger parameter means the more bitplanes in the sub-image with high bitplanes, which is critical to the embedding capacity.

Fig. 4
figure 4

The test images

Fig. 5
figure 5

The changes of “Lena” in different phases

As stated in the previous section, the contribution of the proposed paper is adding a “bitplane operations” process for existing methods, and improving the embedding performance of existing methods. To evaluate the proposed method, we choose algorithms in [6, 28] as two typical examples. However, the methods in [6] or [28] can apply different RDH methods, such as DHS and PEHS. Thus, we use the following four existing methods: [6] with DHS, [6] with PEHS, [28] with DHS, and [28] with PEHS. Detailed performance comparisons between before and after adding the proposed method for four existing methods are provided in Tables 1, 2, 3 and 4. EC and PSNR respectively mean the embedding capacity and peak signal-to-noise ratio, which are two criteria to evaluate RDHEI. “Before” represents the existing method which without adding the proposed “bitplane operations” process, and “After” represents the existing method after adding the proposed “bitplane operations” process. One obvious conclusion is, after using the proposed “bitplane operations” process under any e value, both algorithms achieve higher embedding capacity, while the PSNR value between the directly decrypted image and the original image decrease. It is generally known that in the RDHEI field, EC is relatively more important, and PSNR can be alleviated or even neglected [6]. Therefore, the proposed method has great practical significance. Another conclusion from Tables 1, 2, 3 and 4 is that, the parameter e has great influence over the embedding performance. EC is first increased and then decreased along with the increase of the value of parameter e. Due to the embedding process, both the sub-image with high bitplanes and the sub-image with low bitplanes can be used to embed data when the value of e is low, while only sub-images with high bitplanes can be used to embed data when the value of e is high. In general, the maximum EC can be achieved when the value of e is 3 or 4.

Table 1 Performance comparison between before and after using the proposed method ([6] with DHS)
Table 2 Performance comparison between before and after using the proposed method ([6] with PEHS)
Table 3 Performance comparison between before and after using the proposed method ([28] with DHS)
Table 4 Performance comparison between before and after using the proposed method ([28] with PEHS)

To better demonstrate the advantage of the proposed method in high capacity, we compared the maximum EC between before and after using the proposed method according to Tables 1, 2, 3 and 4. For each image and each algorithm, we choose the e value which corresponding to maximum EC values. Figures 6 has shown the comparison results. Four sub-images correspond to the previous tables. From Fig. 6, the maximum EC of both algorithms can be improved significantly after introducing the proposed method. For example, for image “Lena”, the embedding capacity of algorithm in [6] with DHS method is equal approximately to 30,000 bits, while the maximum embedding capacity of algorithm adding the proposed method rises to more than 140,000 bits (when e = 3). In some practical application that we are more concerned with EC rather than PSNR, we can select the e value which corresponding to maximum EC values. Thus, the proposed method has great practical significance.

Fig. 6
figure 6

Comparison on maximum EC between before and after using the proposed method

The main factor that contributes to high capacity is bitplane operation, rather than the data hiding method or encryption method. As described in Subsection 2.4, due to the bitplane operations, the proposed algorithm can better explore the correlation between neighbor pixels and achieve better embedding capacity. The experimental results have shown that the embedding capacity can be increased significantly after introducing the proposed “bitplane operation”, no matter which data hiding method is used.

The comparison of average run time in embedding phase between the method in [6] and the proposed time are shown in Table 5. The computation times mentioned in this table were measured on an Intel CPU (i7-7500 U, 3.5 GHz), Windows 7 PC with 4.00 GB RAM. From the table, the run times in [6] are approximately 1.06 and 2.32 s, when DHS and PEHS are used respectively. Note that the PEHS method costs more time due to the prediction process. When the “bitplane operation” is added, the times rise to 6.92 and 19.83 s. It is clear that the better performance in embedding capacity in the proposed method engenders more computational complexity, because of the added bitplane operations and parameter optimization. However, the impact of this is not serious in practical application.

Table 5 Average run time comparison (in embedding stage)

4 Conclusion

In this paper, we propose a novel RDHEI method with high embedding capacity based on bitplane operations and adaptive embedding. Instead of considering the image pixel of 8 bits as an integer within the interval of [0,255], the data hider transforms the pixel into two components according to the bitplane parameter. Our experimental results show that high embedding capacity can be achieved using the bitplane-level operations and adaptive embedding, although the PSNR values between the original image and the directly decrypted image are lower compared with the existing algorithms. This is very useful for RDHEI, in which field PSNR is less important than embedding capacity. Future work aims at increasing the PSNR values of directly decrypted images in our method on the basis of keeping the embedding capacity as high as possible.