1 Introduction

In recent years, cloud storage and cloud computing are widely used. Since the servers and the users are usually separated from each other, so the data from the users must be uploaded to the cloud server for storage or processing. These data may include sensitive and private information such as personal ID or biometric data. Thus, personal privacy issues have become a widespread concern. To protect privacy, the data are usually encrypted before uploading to the cloud server. Without knowing the content of the received data, the server may also need to add additional information for labelling and authentication, etc.

A typical application scenario in cloud storage is illustrated in Fig. 1. The user data is an image, which is encrypted in user device 1 (such as a desktop computer), and then is transmitted to cloud server 1. The cloud server 1 embeds a tag into the encrypted image to label the source of this file. The resulting file may need to be transferred to another server, the server 2. In a later time, the user may need to download the file from server 2 to another device, i.e., the device 2, such as his/her cell phone. Then, server 2 needs to extract the hidden data and recover the original encrypted image before sending it to the user. After receiving the recovered image, the user device 2 can decrypt the image. This scenario calls for reversible data hiding (RDH) in encryption domain.

Fig. 1
figure 1

A typical application scenario of RDH in encryption domain

RDH has been proposed in recent years, which can embed the secret data into the meaningful cover media imperceptibly, especially for digital images [14]. It is widely used in many fields such as medical image, copyright certification, activity recognition [5, 6], etc. The current algorithms for RDH can be classified into lossless compression [1], difference expansion [16], pixel prediction [9], and histogram shifting [11]. These algorithms are designed for plain image without encryption.

Recently, a lot of works for RDH in encrypted images have been presented [2,3,4, 10, 13, 15, 17,18,19,20]. However, the current algorithms that embed external data into the encrypted images with error-free decryption may lead to leakage of image content and low embedding rate. In this work, we propose a novel RDH scheme in encrypted images, which can provide higher security level and embedding rate.

The main contributions of this work can be summarized as follows:

  • A block permutation algorithm adapted to prediction error encryption and RDH.

  • A data embedding algorithm by utilizing the extra bits from the reduced dynamic range of the prediction error.

This paper is organized as follows. After the introduction in Section 1, the related works are reviewed in Section 2. Then, we present our proposed algorithm in Section 3. The experimental results and analysis are discussed in Section 4. Finally, we conclude the paper in Section 5.

2 Related works

Extending RDH for encrypted image is not a trivial task since most RDH relies on local smoothness assumption of the host image. But such an assumption is destroyed in encrypted image since encrypted images are noise-like. In 2011, Zhang first proposed a RDH algorithm in encrypted domains. In Zhang’s work [19], the encrypted image was divided into non-overlapping blocks. Then, in each block, external data are embedded by flipping the three least significant bits of half of the pixels. The receiver extracts the data according to the spatial correlation of the natural image. Hong et al. [2] further improved Zhang’s [19] algorithm by modifying its smoothness measurement, which reduces the error rate during data extraction. In another work, Qin and Zhang [15] presented an algorithm with capability of image content protection. It only alters the three LSBs of fewer pixels by the elaborate selection. This algorithm has a smaller error rate than the algorithm of [19] and [2], but data extraction and image decryption in these algorithms are carried out at the same time. The receiver must decrypt the image before extracting the hidden data. So, it is not suitable for the application scenario in Fig. 1. The scenario in Fig. 1 requires that the data extraction and decryption of image to be separable.

Zhang [20] proposed a separable embedding method in 2012. The cloud server embeds the external data by compressing the least significant bit in the encrypted image. Qian and Zhang [13] improved the algorithm of [20] by using the distributed source coding. But their algorithm still cannot restore the original image completely.

In [10], Ma et al. proposed a method for embedding external data in encrypted image by vacating room before encryption. To vacate room, this algorithm embeds LSBs of some pixels into other pixels using traditional RDH method. Ma’s algorithm is free from error during data extraction and image reconstruction. The embedding capacity in the encrypted image is also improved. However, it requires the sender to implement an additional RDH algorithm. It may not work if the sender is lack of the knowledge of RDH. Homomorphic encryption is also adopted to RDH in the encrypted domain. In [4], the original image is first encrypted by using Paillier encryption. Then, additional data can be embedded easily because of the homomorphic property of the Paillier cipher. Although it achieves good performance in both embedding capacity and quality of the decrypted images, it still suffers from high computational complexity, and the encrypted image is seriously expanded.

Recently, Xu et al. proposed a separable and error-free RDH in encrypted domain [18]. Joint error expansion and histogram shifting are applied to a partially encrypted image. One drawback of this algorithm is the leakage of image content due to partial encryption. In addition, the embedding rate is not high enough. This is the motivation of our work.

This paper aims to improve the security and embedding rate of Xu’s algorithm. To improve security, a block permutation algorithm is proposed so that the local dependence in the encrypted image is further mitigated without interfering with the later RDH. To improve the embedding rate, the extra bits from the reduced dynamic range of the prediction error are replaced by hidden data. This algorithm has the advantages of built-in embedding flag, high embedding rate and high security. Furthermore, it can be applied to different application scenarios where the receiver may have either the encryption key or data-hiding key, or both.

There are two main differences between Xu’s algorithm and ours. Different from simple partial stream encryption in Xu’s algorithm, a block permutation adapted to prediction error encryption and RDH is used to prevent the leakage of image texture. Different from histogram shifting and prediction error expansion in the data embedding procedure of Xu’s scheme, a bit replacement in prediction error is proposed to improve the embedding rate.

3 Proposed scheme

In this section, we present our proposed algorithm for RDH in encrypted domain. The overall block diagram of the proposed scheme is illustrated in Fig. 2. As shown in Fig. 2, three parties are involved, the image owner, the cloud server and the receiver.

Fig. 2
figure 2

Block diagram of the proposed scheme

The image owner encrypts the original plain image to avoid image leakage. This is done by a combination of steam cipher and block scrambling. Then, the encrypted image is sent to the cloud server. The cloud server embeds external data (such as authentication information) into the secret image without knowing the original image content. The secret image containing hidden data is transmitted to the receiving end. The receiver can process it according to the different application requirements, where the receiver may have only the encryption key or the data-hiding key, or both keys. Figure 3a shows a block diagram of image encryption and data hiding. Figure 3b shows the corresponding data extraction and image decryption at the receiving end.

Fig. 3
figure 3

The block diagrams of the proposed scheme: a Image encryption and data hiding, and b data extraction and image decryption

The current algorithm supports three typical situations at the receiving end. If the recipient only wants to authenticate the image or extract the hidden data, he only needs to use the data-hiding key (Kd) to extract the authentication information. If the recipient only wants to understand the contents of the image in general, and does not care about the embedded data, then he can use the image encryption keys (Kp and Ks) to decrypt the image. Then a decrypted image that is close to the original image can be obtained. If the recipient needs to extract the authentication information and the image content, he can first extract the data using the data-hiding key (Kd). Then, he can restore the original image according to the image encryption keys (Kp and Ks). Figure 4 shows the three cases that are described above.

Fig. 4
figure 4

Three cases at receiving end

Next, we describe the various steps outlined in Fig. 3.

3.1 Image preprocessing

This step is done before the image encryption and is implemented by the image owner. Its purpose is reducing the dynamic range of some pixels to allocate space for the later RDH.

Assuming that the original image is an 8-bit quantized gray-scale image with size M × N. The pixel values fall into the range [0,255]. The image is divided into a set of non-overlapping blocks with the size, say 3 × 3, as shown in Fig. 5. The partition starts from the top left corner and ends at the bottom right corner.

Fig. 5
figure 5

Illustration of image preprocessing: block partition and pixel prediction

In Fig. 5, each small block corresponds to one pixel. All pixels are divided into two categories, namely, (a) sample pixels (SP) and, (b) non-sample pixels (NSP). The SP is made up of all the shaded pixels in the figure, and NSP is made up of all the white pixels in the figure. Prediction is widely used in various applications [7, 8, 12]. Prediction of pixels is also highly important in this work. The pixels belonging to SP are used to predict the NSP. For a 3 × 3 block, there are four SPs and five NSPs. We record the original pixel in the image as O(i,j), where (i,j) is the pixel position, and 1 ≤ iM, 1 ≤ jN. The pixels marked as in NSP are predicted by two neighborhood pixels in the horizontal direction, and the predicted values are recorded as OH(i,j). The estimator is:

$$ O{_{H}}(i,j)=\left\lfloor \frac{O(i,j-1)+O(i,j + 1)}{2} \right\rfloor, $$
(1)

where ⌊x⌋ returns the nearest integer of x towards \(-\infty \). The pixels marked as in the NSP are predicted by two neighborhood pixels in the vertical direction, and the predicted values are recorded as OV(i,j):

$$ O{_{V}}(i,j)=\left\lfloor \frac{O(i-1,j)+O(i + 1,j)}{2} \right\rfloor. $$
(2)

The pixels marked as in NSP are predicted by two diagonal neighborhood pixels whose difference are relatively small, and the predicted values are recorded as OD(i,j). The predictor is:

$$ O{_{D}}(i,j)=\left\{\begin{array}{ll} \left\lfloor \frac{O(i-1,j-1)+O(i + 1,j + 1)}{2} \right\rfloor,&\text{if}\ a<b,\\ \\ \left\lfloor \frac{O(i + 1,j-1)+O(i-1,j + 1)}{2} \right\rfloor,&\text{otherwise}, \end{array}\right. $$
(3)

where

$$\begin{array}{@{}rcl@{}} a&=&\left|O(i-1,j-1)-O(i + 1,j + 1)\right|,\\ b&=&\left|O(i + 1,j-1)-O(i-1,j + 1)\right|. \end{array} $$

Here, a smoother pair of diagonal pixels is chosen to improve the accuracy of the predictor. The prediction error E(i,j) is calculated from the predicted value and the NSPs are replaced by the corresponding prediction errors.

As we know, each pixel in the gray-scale image within the range of [0,255] can be represented by an 8-bit binary number. Denote the binary bits of a pixel as b0(i,j),b1(i,j),...,bk(i,j), where (i,j) is the pixel position and k indicates the k-th bit. Thus, we have:

$$ b_{k}(i,j)= \left\lfloor \frac{O(i,j)}{2^{k}} \right\rfloor \mod \ 2 ,\ k = 0,1,...,7.\\ $$
(4)

In order to identify the polarity of the prediction error, we use the method in [18]. The polarity is marked by the most significant bit (MSB). We set the MSB to 1 if the prediction error is negative and set the MSB to 0 if the prediction error is positive. Figure 6 shows an example of the prediction errors 10 and -10. The left most bit, i.e., the MSB is set to 0 for 10 and is set to 1 for -10.

Fig. 6
figure 6

An example of positive and negative identification of prediction error

Allocating the MSB for polarity, now only 7 bits can be used to represent the gray value of the prediction error. That is, one can only represent 0(0000000) to 127(1111111). In order to prevent overflow, the values that are less than -127 are mapped back to [− 127,127] by adding 127 to it. The errors that are larger than 127 are subtracted by 127 from it. To help the receiver to recognize these positions, we need a location indicator. The location indicator for the pixel beyond the range of [− 127,127] is marked as 1, otherwise it is marked as 0. Thus, a location map L(i,j) is produced. The mapping and construction of location map is summarized as:

$$ \left\{\begin{array}{lll} E(i,j) \leftarrow E(i,j)+ 127 \,\, \mathrm{ and } \,\, L(i,j) = 1 ,&\text{if}\ E(i,j)<-127,\\ E(i,j) \leftarrow E(i,j)-127 \,\, \mathrm{ and } \,\, L(i,j) = 1, &\text{if}\ E(i,j)>127,\\ E(i,j) \leftarrow E(i,j) \,\, \mathrm{ and } \,\, L(i,j) = 0, &\text{if}\ -127\le E(i,j)\le 127. \end{array}\right. $$
(5)

Most of the prediction error is within the range of [-127,127] because of the local correlation of the image pixels. So the positioning map is mainly composed of ‘0’, which can be compressed to a small number of bits using the lossless compression algorithm, such as run-length code. In order to restore the original image, the lossless compressed location map L will be sent as auxiliary information to the receiver just as done in [18].

It should be noted that the size of block is not limited to 3 × 3. By dividing the image into 5 × 5 or 7 × 7 non-overlapping blocks for processing, the number of pixels belonging to NSP also increases.

3.2 Image encryption

Image encryption is divided into two steps: block permutation and stream encryption. The motivation for using a combined block permutation and a stream cipher is to avoid texture leaks in secret images in Xu’s algorithm. The block permutation disrupts the original position of each block in the image. Stream encryption ensures that the original pixel information in each block is not compromised.

Block permutation: :

This step permutes all the preprocessed blocks using the permutation key (Kp). We can get a permuted image after this step. It should be noted that this step only permutes the order of the blocks within the image. The pixel position in the block remains unchanged [3]. In the cloud, even if the data hider does not know the permutation key, but he still knows the relative location of the NSP in a block.

Stream encryption: :

Generate a random number R(i,j) in the range of [0,255] based on the stream encryption key (Ks). The length of R(i,j) is the same as that of the original pixel O(i,j) belonging to SP. And then the bitwise exclusive-or (XOR) operation is performed between each bit plane of R(i,j) and O(i,j). Let rk(i,j) be the bit plane representation of R(i,j), i.e.,

$$ r_{k}(i,j) = \left\lfloor \frac{R(i,j)}{2^{k}} \right\rfloor \mod 2 , \,\,\, k = 0,1,...,7. $$
(6)

Let bk(i,j) be the bit plane representation of the SP as calculated in (4). Then the encryption for SP can be calculated as:

$$ c_{k}(i,j) = b_{k}(i,j) \oplus r_{k}(i,j), \,\, k = 0,1,2,\cdots,7, \forall (i,j) \in \mathcal{S}. $$
(7)

where \(\mathcal {S}\) is the set of SP in the image. From the encrypted bit planes, we can get the encrypted pixels as:

$$ C(i,j) = \sum\limits_{k = 0}^{7} c_{k}(i,j) \cdot 2^{k}. $$
(8)

Eventually, we get the secret image after block permutation and stream encryption. In addition, the stream encryption key (Ks) is also used for encryption if there are edge pixels that cannot be divided into blocks.

3.3 Data embedding

When the cloud server receives the encrypted secret image, external data can be embedded by data hider at the server. First, the data hider encrypts the data according to the data-hiding key (Kd). Then, the prediction error E(i,j) belonging to the NSP in the secret image is sequentially extracted without knowing the original image content. Finally, if the prediction error falls within the range of [−T,T], the additional bits are embedded in a bit-substitution manner. If the prediction error is outside of the range [−T,T], then the pixel is marked. The motivation for using a bit-substitution manner is to increase the embedding capacity. A typical example is given below.

Assuming that T = 15, the pixel values within the range of [-15,15] can be represented by the lowest four bits. The MSB is allocated for polarity. In Fig. 7, a binary form of NSP pixels with a prediction error of 10 are presented. The MSB is the polarity mark.

Fig. 7
figure 7

Binary form of prediction error

As can be seen from Fig. 7, when the prediction error E(i,j) ∈ [− 15,15], the pixel value can be expressed by its MSB and the lowest four bits. The 4th, 5th and 6th bits are 0s. We replace the 5th and 6th bits with external data, in order to embed additional information. The 4th bit is used as an embedding flag . When the prediction error falls within the range of [− 15,15], the 4th bit remains unchanged, i.e., 0. But when the prediction error falls outside of the range [− 15,15], its 4th bit is set to 1. Its original bit is saved and embedded into the secret image along with the external data.

The parameter T can be set to 1, 3 or 7. The corresponding range is [− 1,1],[− 3,3], or [− 7,7]. The embedding algorithm is the same as the above example. Since the number of prediction error values in these ranges is large, a large amount of external information can be embedded. What’s more, the recording step of the positioning map is not necessary since the flag bit is stored inside the pixel value. For example, when T = 3, the pixel value can be represented by its 7th bit (positive and negative sign) and the lower two bits (0/1). Its 2nd to 6th bits in its binary form are 0s. We can embed external data by replacing the 3rd to the 6th bits, and the 2nd bit is used as the embedding flag.

3.4 Data extraction and image decryption

Once the receiver obtains the secret image containing the hidden data, he can not only extract the data, but also losslessly decrypt the original image if he has both encryption keys (Kp and Ks) and data-hiding key (Kd). If he has only the data-hiding key (Kd), he can extract the embedded data; If he has only encryption keys (Kp and Ks), he can obtain the distorted image but cannot extract the hidden data. Thus, this algorithm is flexible and can support the above three application requirements.

3.4.1 Data extraction

The recipient only needs to use the data-hiding key (Kd) to extract the authentication information if he only needs to authenticate the image without knowing its content. Take T = 15 as an example. The receiver detects the NSP pixels’ 4th binary bit sequentially (from top to bottom and from left to right). If it is 0, then extracts the 5th and 6th bits as the data; if it is 1, then skip this bit and continue to detect the next NSP pixels. After all the data are extracted, the data-hiding key (Kd) is used to decrypt the data to obtain the embedded data in plain text form. The original image can be easily restored from the recovered prediction error.

3.4.2 Image lossless decryption

If the receiving end has a higher requirements, he not only needs to extract the embedded information to authenticate the image, but also needs to understand the full content of the image. In this case, the receiver first extracts the data by using the data-hiding key (Kd), as outlined in Section 3.4.1. Then, he restores the original image according to the image encryption keys (Kp and Ks). The detailed implementation steps are as follows:

  1. 1.

    Decrypt the extracted data by using the data-hiding key (Kd) and separate the original bit from the decrypted data.

  2. 2.

    Detect NSP pixels’ 4th binary bit in sequence. If it is 0, set the 6th and 7th bit to 0; if it is 1, restore the original 4th bits.

  3. 3.

    Determine whether the error is positive or negative according to the 7th bit (positive and negative sign). Get the predicted error E(i,j).

  4. 4.

    The pixels belonging to SP are decrypted according to the stream encryption key (Ks) to obtain the original SP pixel O(i,j). The prediction pixels OP(i,j) belonging to NSP which contain OH(i,j), OV(i,j), and OD(i,j) are then calculated according to (1), (2) and (3). The prediction error E(i,j) is calculated as well.

  5. 5.

    The original prediction error that falls outside of [− 15,15] is restored by the following equation according to E(i,j) and the location map L:

    $$ E(i,j)=\left\{\begin{array}{ll} E(i,j)-127,&\text{if}\ E(i,j)<0,L(i,j)= 1,\\ E(i,j)+ 127,&\text{if}\ E(i,j)>0,L(i,j)= 1. \end{array}\right. $$
    (9)
  6. 6.

    The original pixels belonging to NSP are calculated according to the following equation:

    $$ O(i,j)=O_{P}(i,j)+E(i,j). $$
    (10)
  7. 7.

    Inversely permute the blocks to get the original image according to permutation key (Kp).

Fig. 8
figure 8

Original image O: a. Lena b. Couple c. Boat d. Man e. Peppers f. Baboon

3.4.3 Image lossy decryption

If the receiver does not want to extract the embedded data, but just wants to understand the contents of the image, he can get an image that is visually close to the original image by using the image encryption keys (Kp and Ks).

  1. 1.

    Detect NSP pixels’ 4th binary bit in sequence. If it is 0, set the 6th and 7th bit to 0; if it is 1, then skip it to the next NSP pixel.

  2. 2.

    Determine whether the error is positive or negative according to the 7th bit (positive and negative sign). Get the direct restored predicted error E(i,j) that is different from the original predicted error.

  3. 3.

    The pixel belonging to SP is decrypted according to the stream encryption key (Ks) to obtain the original SP pixel O(i,j). The prediction pixels OP(i,j) belonging to NSP which contain OH(i,j), OV(i,j), and OD(i,j) are then calculated according to (1), (2) and (3).

  4. 4.

    The restored pixels belonging to NSP are calculated according to the following equation:

    $$ O^{\prime}(i,j)=O_{P}(i,j)+E^{\prime}(i,j).\\ $$
    (11)

    The overflowed pixel can be modified according to the following equation as we just want to know the contents of the image:

    $$ E^{\prime}(i,j)=\left\{\begin{array}{ll} 0,&\text{if}\ E^{\prime}(i,j)<0,\\ 255,&\text{if}\ E^{\prime}(i,j)>255. \end{array}\right. $$
    (12)
  5. 5.

    Inversely permute the blocks to get the distorted image without embedded data according to permutation key (Kp).

4 Experimental results

In this section, we present our experimental result. First, the security of the encrypted image is tested. Then the embedding rate is tested and compared with classical and recent related algorithms. Finally, we present the result of visual quality of the lossy decrypted image.

The experiments are done on six standard testing images, including Lena, Couple, Boat, Man, Peppers, and Baboon, as shown in Fig. 8. All of these images are gray-scale with size 512 × 512.

4.1 Security of encrypted image

The encrypted image using our algorithm is shown in Fig. 9, where the block size is 3 × 3. The encrypted image using the method in [18] is shown in Fig. 10. Visual inspection reveals that the the original image contour as shown in Fig. 10 does not appear in the our encrypted image, thanks to the scrambling encryption. This scrambling increases the security performance.

In order to quantitatively characterize the security level, the correlation coefficient ρoc between the secret image C and the original image O is calculated from the (13) [17].

Fig. 9
figure 9

Encrypted image C: a. Lena, PSNR = 8.50dB b. Couple, PSNR = 8.71dB c. Boat, PSNR = 7.19dB d. Man, PSNR = 8.25dB e. Peppers, PSNR = 7.28dB f. Baboon, PSNR = 8.01dB

Fig. 10
figure 10

Encrypted image using Xu’s algorithm [18]: a. Lena, PSNR = 8.21dB b. Couple, PSNR = 9.15dB c. Boat, PSNR = 7.59dB d. Man, PSNR = 8.66dB e. Peppers, PSNR = 7.48dB f. Baboon, PSNR = 9.00dB

$$ \rho_{oc}=\frac {\mathbb{E}[(O-\mathbb{E}(O))(C-\mathbb{E}(C))]}{\sqrt{ \mathbb{D}(O)\mathbb{D}(C)}}, $$
(13)

where O and C represent the original image matrix and the secret image matrix, respectively. \(\mathbb {E}(O)\) and \(\mathbb {D}(O)\) are the expected value and variance of O, respectively; \(\mathbb {E}(C)\) and \(\mathbb {D}(C)\) are the expected value and variance of the secret image C, respectively. The expected value and variance are calculated from the sample mean and the sample variance as follows:

$$ \mathbb{E}(O)=\frac {1}{M\times N} \sum\limits_{i = 1}^{M} \sum\limits_{j = 1}^{N} O(i,j), \\ $$
(14)
$$ \mathbb{D}(O)=\frac {1}{M\times N} \sum\limits_{i = 1}^{M} \sum\limits_{j = 1}^{N} (O(i,j)-\mathbb{E}(O))^{2}. \\ $$
(15)

The correlation coefficients ρoc for the above images are given in Table 1. It can be seen from Table 1 that the encryption performance is improved. The result from Xu’s algorithm [18] is also listed in the second column. We observe that, the correlation coefficient for our algorithm is significantly smaller than those for Xu’s algorithm. The correlation coefficient of each image in this encryption scheme tends to be 0.

Table 1 Correlation coefficient between the secret image and the original image

4.2 Embedding capacity

Tables 2 and 3 show the number of embedded bits (EB) and the embedding rate (ER) for T = 1, T = 3, T = 7, and T = 15, where the sizes of the block are 3 × 3 and 5 × 5. The number of embedded bits does not include the original bits that need to be embedded together with the external data.

Table 2 The embedding capacity when the image is divided into 3 × 3 blocks
Table 3 The embedding capacity when the image is divided into 5 × 5 blocks

It can be seen from Tables 2 and 3 that, as T goes from one to seven, the embedding rate increases. The embedding rate is the largest when T = 7. Take a smaller T, then most prediction errors are outside of the range of [−T,T] and hence is not suitable for embedding, while a larger T allows more errors to be embeddable. But if we increase T, then less bits are available for embedding. Referring to Fig. 7, when T = 7, then three bits are available for embedding external data. But if we choose T = 15, only two bits are available for embedding. So there is a tradeoff in choosing T. Our experimental results reveal that using T = 7 provides good balance between these two stretching forces.

Comparing Table 2 with Table 3, we notice that, given the same T, the ER increases with the block size. This is because the larger the block size, the more the prediction errors are available, the more bits can be embedded (Fig. 5).

From the embedding capacity table, we also notice that the embedding capacity of Baboon is significantly smaller than other images. When T = 1, no external data can be embedded into baboon. The reason is that, baboon has more texture, and there are few prediction errors that can be used to embed data. Obviously, the proposed algorithm achieves high embedding capacity because more predictive errors are used in this algorithm.

4.3 The visual quality of the distorted image

If the receiver has only the encryption key pair (Kp and Ks), then he can decrypt the image and partially recover the modification from RDH, as outlined in Section 3.4.3. So the visual quality of the decrypted image should be high enough so that the distortion cannot be perceived by human visual system. To quantify the visual quality, we use the PSNR measure that is widely used in evaluating the quality of digital images.

As Zhang [19] first proposed the application of RDH in encrypted domains and this paper aims to improve the performance of Xu’s [18] algorithm, we compared PSNR of the lossy decrypted images in our algorithm with these two algorithms, as shown in Fig. 11.

The experimental results demonstrate that, using block size 3 × 3 and T = 15, our algorithm significantly outperforms these reference algorithms. This performance improvement can be explained as follows. For Zhang’s algorithm, since 3 LSBs of 50% pixels in each image block are flipped in the data embedding procedure, the visual quality of lossy decrypted images is greatly reduced. In Xu’s work, both bidirectional histogram shifting and difference expansion based methods are unified to embed additional data into prediction errors. However, this unified method leads to excessive modification of prediction errors. It can not get high enough visual quality either in the lossy decrypted procedure. In this paper, most prediction errors can be recovered in the lossy decrypted procedure since prediction errors were marked by built-in embedding flags. With the same ER, the PSNR of the lossy decrypted image in our method is significantly higher.

Figure 12 shows the lossy decrypted image for block size 3 × 3 and T = 15. In general, for images with simple contents, i.e., less textures, such as Lena and Peppers, the PSNR is above 45dB; For images with more complex textures, such as Baboon, the PSNR is above 35dB. So, the distortion is in general not visible to a human visual system. Due to the setting of the algorithm in this scheme, the distorted image still has a high visual quality.

The source of distortion can be explained as follows. From the algorithm in Section 3.3, when the prediction error E(i,j)∉[−T,T], the original bits in the position of embedding flag are appended to the end of the data to be embedded. Since all the embedded bits are lost after direct decryption, these bits are also destroyed. This results in distortion to the cover image.

Fig. 11
figure 11

PSNR comparison for the lossy decrypted images: a. Lena, b. Couple, c. Boat, d. Man, e. Peppers, f. Baboon

Fig. 12
figure 12

Distorted images D: a. Lena, PSNR = 48.50dB b. Couple, PSNR = 44.57dB c. Boat, PSNR = 45.94dB d. Man, PSNR = 45.64dB e. Peppers, PSNR = 47.48dB f. Baboon, PSNR = 36.55dB

4.4 The execution time of different stages

Computational time is also an important aspect of image processing algorithms. We evaluated the time complexity of the proposed scheme and Xu’s scheme with the same embedding rate. These two algorithms are implemented in Matlab 2013b. Table 4 lists the execution time for the different images in the four stages of the proposed scheme and Xu’s scheme. It need to be noted that the image encryption time in Table 4 includes the time for image preprocessing.

Table 4 The execution time for the different stages (unit)

It can be seen from Table 4 that the time for image encryption and decryption of the proposed scheme is longer than Xu’s. The reason is that, the block permutation process is adopted in the proposed scheme to increase the security of encrypted images. This process increases the execution time. We can also see from Table 4 that the time for data embedding of the proposed scheme is less than Xu’s. Compared to histogram shifting and prediction error expansion, simple bit replacement according to built-in embedding flag has a lower complexity calculation during the data embedding process. Since the execution time in different stages does not exceed two seconds, it can satisfy the real-time requirement in applications.

Convergence and stability are important aspects of computer algorithms. Even though the current algorithm is not iterative in nature, but the randomness introduced in encryption stage may arouse stability and convergence problem. Specifically, we should consider the randomness introduced from block permutation and stream encryption. To test the convergence and stability, all the previous experiments are conducted using randomly selected keys. Experimental results demonstrate that, the encoding and decoding can be finished in finite time, as shown in Table 4. Thus, our algorithm is stable under different keys and convergence is validated.

5 Conclusion

This paper designed a separable and error-free RDH in encrypted domain. Using a carefully designed block permutation algorithm adapted to the RDH, the image content leakage problem in Xu’s algorithm is further mitigated. To quantitatively characterize the security level, the correlation coefficient between the encrypted image and the original image is calculated. The results show that the proposed algorithm provides correlation that is less than 1/10 of that of the reference algorithm. What’s more, a data embedding algorithm utilizing the extra bits from the reduced dynamic range of the prediction error was used to increase the embedding rate. This embedding algorithm has an advantage of built-in embedding flag. The experimental results on test images show that the embedding rate is significantly improved. Furthermore, the PSNR of the lossy decrypted image has better performance. The execution time in different stages is also evaluated. Experiments prove that the execution time consumed by our scheme can satisfy the real-time requirement in applications. The performance analysis implies that the method in this paper can achieve good results for practical applications. It can be used in three cases adapted to different needs.

Since our algorithm is targeted for cloud computing scenarios, so data extraction in decrypted images is not feasible. Our future work is to solve this problem to expand the application in different scenarios.