1 Introduction

Steganography, also called data hiding, is a technique that embeds data into a cover medium in an irreversible or reversible manner, which can be used for covert communication and copyright protection etc. For some important cover medium, such as medical and military images, the property of reversibility of data hiding is essential, which means that the original form of the image, before the secret data were embedded, must be completely recoverable after the embedded data are extracted. In addition to hiding capacity, the quality of the stego image is also a key point of the researches on reversible data hiding because, the higher the quality of stego image is, the less likely it becomes that an adversary could identify the presence of hidden information [1, 3, 4, 9, 12].

Recently, a large number of reversible data hiding schemes have been developed for cover images in various forms, such as color images [4], gray-level images [12], and compressed images [1]. Difference expansion and histogram shifting are the two most common techniques used in reported reversible data hiding schemes. In the difference expansion based schemes, such as [12], the differences of the non-overlapping, neighboring pixel pairs in the cover image were doubled and then modified according to the parity of the embedding secret bits. In the histogram-shifting based schemes, such as [9], the peak point of the cover image histogram was chosen, and the pixel values in the range from its right one to the zero point were increased by one to create one vacant histogram bin for secret data embedding. In the schemes of [58, 10, 11, 13, 16], the prediction mechanism was integrated. These kinds of prediction-based schemes first conducted the prediction process to estimate the cover image pixels, and the prediction error, i.e., the difference between the cover pixel and the prediction result, was used to embed the secret data by difference expansion [7, 8] or histogram shifting [5, 6, 10, 11, 13, 16]. The consistency of the prediction results in the embedding and extracting procedures ensures the correctness of the extraction of the secret bits and the recovery of the cover image.

Usually, the higher the hiding capacity is in reversible data hiding schemes, the lower the visual quality of stego image becomes. Although the operations of difference expansion and histogram shifting can create some spare space for data embedding, severe distortions occur between the stego image and the cover image. However, unsatisfactory visual quality of the stego image may not be acceptable in data hiding, especially for covert communication, because images that are obviously degraded must attract the attention of adversaries who may use steganalysis tools to conduct an attack. Therefore, it is important to consider the characteristics of the image content and human visual sensitivity when embedding data to reduce the degradation of the stego image as much as possible.

In this paper, we propose an adaptive reversible steganographic scheme based on the human visual system (HVS). By taking advantage of the sensitivities of the frequency components, the value of the just noticeable distortion (JND) for each cover image pixel is calculated. Then, the anisotropic interpolation is adopted to conduct pixel prediction and produce smaller prediction errors. The distribution characteristic of cover image pixel and the relationship between the JND value and the prediction error are analyzed to adaptively decide whether the current pixel is suitable for secret bit embedding or not. Due to the adjustment by the JND values, the complex regions and the regions with smaller prediction errors can be embedded with more secret bits, which can simultaneously reduce the visual distortion and provide satisfactory hiding capacity.

The rest of the paper is organized as follows. Section 2 reviews the typical reported reversible data hiding schemes. Section 3 describes the detailed procedures of the proposed scheme based on the JND mechanism. Experimental results and comparisons are presented in Section 4, and conclusions are finally drawn in Section 5.

2 Related works

Earlier research on reversible data hiding by Fridrich et al. focused on compressing the chosen subset of original cover image, such as lower bit-planes, losslessly and replacing the chosen bit-planes with the concatenated bits of its compressed version and the secret bits for hiding [3]. The hiding capacity of this kind of method depended on the difference of data amounts of the chosen bit-planes before and after lossless compression. Higher payloads forced more bit-planes to be used, which may quickly increase perceptible distortions on cover image beyond an acceptable level. Fridrich et al. improved their prior method by defining a discrimination function and an invertible flipping operation in [4]. According to the discrimination function, all disjoint pixel groups in cover image were categorized into three types, i.e., regular groups, singular groups, and unusable groups. The regular groups and the singular groups corresponded to the binary bits, i.e., 1 and 0, respectively, and the unusable groups can not be used for embedding. If the secret bit and the group type didn’t match, the flipping operation was applied to the group to obtain a match. Thus, each regular and singular group can be embedded with one bit. However, in order to achieve the reversibility, the vector indicating the original status of regular and singular groups should be losslessly compressed and then be embedded as extra information bits together with secret bits. Because only parts of the pixel groups can be used and the extra information bits occupied some embedding space, the hiding capacity of this method was not high enough.

In the recent years, the techniques of difference expansion and histogram shifting were introduced into reversible data hiding. Tian first proposed a reversible data hiding scheme using the difference expansion technique [12]. In his work, the cover image was divided into a series of non-overlapping, neighboring pixel pairs, and the differences of the pixel pair that were smaller than a pre-determined threshold were doubled. Then, the doubled differences were either kept reserved or modified to match the parity of the secret bits for embedding. The receiver can easily extract the embedded secret bits from the least significant bits (LSB) of the differences of the pixel pairs in stego image. However, the doubled differences of pixel pairs severely degraded the stego image quality, and the extra information for solving the underflow and overflow problems should also be embedded, which decreased the pure capacity of this method. A histogram shifting based method was proposed by Ni et al. in [9]. The pixel values in the range from the right one of the histogram peak point to the nearest zero point were all increased by one to create one vacant histogram bin for embedding. The pixel values corresponding to the peak point were utilized for data embedding, which were either kept intact or modified by one level according to the secret bits. But, for some cover images with flat distribution of histograms, the hiding capacity of this method was not satisfactory. Additionally, the information of peak point and zero point should also be transmitted to the receiver side for data extraction and image recovery.

In order to further improve the performances of hiding capacity and stego image quality, the prediction mechanism has been utilized in recent studies. Instead of directly using the original image, the prediction based schemes used the prediction errors as the cover data for embedding [58, 10, 11, 13, 16]. Lee et al. proposed a prediction based method using difference expansion [7]. Because most of the differences between the cover pixels and the corresponding prediction values are small, the large quantities of prediction errors can be used to embed secret bits by exploiting the expansion, which can achieve greater hiding capacity than [12]. Hong et al. tried the bi-linear interpolation and the bi-cubic interpolation to predict the cover image according to the selected reference pixels [5]. Then, the histogram of prediction errors was shifted to embed the secret bits. Based on the above analysis, we can find that the prediction technique and the embedding strategy of prediction errors are the two key points of the prediction based reversible data hiding schemes. In the subsequent section, a novel scheme using the anisotropic interpolation based prediction and the HVS sensitivity based embedding strategy is presented.

3 Proposed scheme

In the proposed scheme, the data embedding procedure is conducted on cover image pixels in raster-scanning order, and JND value, prediction error and the distribution characteristic of the current processing pixel depend totally on its previous processed pixels. Consequently, during the extraction and recovery procedure, the recovered pixels can assist the secret data extraction and pixel recovery for the subsequent pixels in raster-scanning order.

Before embedding, a pre-processing step should be implemented on the cover image to avoid the overflow and underflow problems caused by embedding. A threshold, i.e., λ, is set to ensure that the gray values of all pixels after embedding are changed no more than 2λ. All cover pixels valued at [0, 2λ − 1] and [255 − 2λ + 1, 255] are modified to 2λ and 255 − 2λ, respectively. To achieve reversibility, the auxiliary information that records the modifications of this pre-processing step is compressed by a run-length coder and appended with the pure secret data to generate the final to-be-embedded bits. Thus, the procedures of JND calculation and data embedding are applied on the pre-processed cover image rather than the original cover image.

3.1 JND calculation

Various HVS models have been proposed to describe the sensitivity of human eyes for images, and the JND value of an HVS model denotes the smallest change in a pixel value that the human eyes can perceive. Basically, the HVS model can be constructed either in spatial domain [2] or frequency domain [15]. In this work, we utilize an HVS model derived from the frequency domain of discrete cosine transform (DCT) [15]. This model treats each DCT coefficient as an approximation to the local response of a visual channel, and an 8 × 8 perceptual error matrix, i.e., the Watson matrix, is generated for the DCT coefficients of each image block by the adjustment of contrast sensitivity, light adaptation, and contrast masking. Two perceptual modes are provided in this HVS model, i.e., image-independent perceptual (IIP) mode and image-dependent perceptual (IDP) mode. In this work, the IIP mode is used for JND calculation.

The cover image I sized M × N is first divided into M / 8 parts with the same size horizontally, i.e., H 1, H 2, …, H M / 8, and the first eight columns of I are kept unchanged and embedded with no secret data. Thus, for each H i (i = 1, 2, …, M / 8), there are 8N − 64 cover pixels that can be used for data embedding. Denote the first N − 8 overlapping blocks sized 8 × 8 in H i as B (i) k (k = 1, 2, …, N − 8), and each B (i) k consists of the pixels I(x, y), where x = 8i − 7, 8i − 6, …, 8i, and y = k, k + 1, …, k + 7. We conduct DCT transform on each B (i) k , and the obtained DCT coefficient matrix is expressed as C (i) k . The Watson matrix Ψ, which denotes the largest tolerable variation of each DCT coefficient for imperceptibility by human eyes, is shown in Eq. (1).

$$ \boldsymbol{\Psi} =\left(\begin{array}{rrrrrrrr}\hfill 1.40& \hfill 1.01& \hfill 1.16& \hfill 1.66& \hfill 2.40& \hfill 3.43& \hfill 4.79& \hfill 6.56\\ {}\hfill 1.01& \hfill 1.45& \hfill 1.32& \hfill 1.52& \hfill 2.00& \hfill 2.71& \hfill 3.67& \hfill 4.93\\ {}\hfill 1.16& \hfill 1.32& \hfill 2.24& \hfill 2.59& \hfill 2.98& \hfill 3.64& \hfill 4.60& \hfill 5.88\\ {}\hfill 1.66& \hfill 1.52& \hfill 2.59& \hfill 3.77& \hfill 4.55& \hfill 5.30& \hfill 6.28& \hfill 7.60\\ {}\hfill 2.40& \hfill 2.20& \hfill 2.98& \hfill 4.55& \hfill 6.15& \hfill 7.46& \hfill 8.71& \hfill 10.17\\ {}\hfill 3.43& \hfill 2.71& \hfill 3.64& \hfill 5.30& \hfill 7.46& \hfill 9.62& \hfill 11.58& \hfill 13.51\\ {}\hfill 4.79& \hfill 3.67& \hfill 4.60& \hfill 6.28& \hfill 8.71& \hfill 11.58& \hfill 14.50& \hfill 17.29\\ {}\hfill 6.56& \hfill 4.93& \hfill 5.88& \hfill 7.60& \hfill 10.17& \hfill 13.51& \hfill 17.29& \hfill 21.15\end{array}\right). $$
(1)

The JND values for each B (i) k can be calculated using Eqs. (2)–(4):

$$ {\widehat{\mathrm{C}}}_k^{(i)}\left(x,y\right)=\left[\left|{\mathrm{C}}_k^{(i)}\left(x,y\right)\right|+\Psi \left(x,y\right)\right]\cdot \mathrm{sign}\left[{\mathrm{C}}_k^{(i)}\left(x,y\right)\right], $$
(2)
$$ {\widehat{\mathbf{B}}}_k^{(i)}=\mathrm{IDCT}\left({\widehat{\mathbf{C}}}_k^{(i)}\right), $$
(3)
$$ {\mathrm{JND}}_k^{(i)}\left(x,y\right)=\left|{\widehat{\mathrm{B}}}_k^{(i)}\left(x,y\right)-{\mathrm{B}}_k^{(i)}\left(x,y\right)\right|, $$
(4)

where IDCT denotes the inverse discrete cosine transform, the function sign(⋅) returns the sign of the input parameter, and JND (i) k (x,y) is the JND value for each pixel B (i) k (x,y) in block B (i) k . As stated at the beginning of Section 3, the JND value of the current pixel should be derived from its previous pixels in raster-scanning order for reversibility. Thus, the JND values of all the M(N − 8) cover pixels that can be used for data embedding in H i (i = 1, 2, …, M / 8) are estimated using the JND values of their left pixels directly, i.e., JND(x, y) ← JND(x, y − 1), where x = 1, 2, …, M and y = 9, 10, …, N. The estimated JND values are then adjusted by the nonlinear function in Eq. (5) to make it more suitable for data embedding.

$$ {\mathrm{JND}}^{\ast}\left(x,y\right)={e}^{\mathrm{JND}\left(x,y\right)}\cdot {2}^{\lambda -1}. $$
(5)

3.2 Data embedding

The embedding procedure of the proposed scheme is based on the expansion of prediction error. Thus, before embedding, the prediction value of each embeddable pixel should be computed. We denote the neighboring region of the current pixel I(x, y) for embedding as Ω x, y , and all K pixels in Ω x, y are at the previous of I(x, y) in raster-scanning order. The prediction value for I(x, y) is obtained by Eq. (6):

$$ \begin{array}{ll}{I}_p\left(x,y\right)={\displaystyle \sum_{i=1}^K{\alpha}_i\cdot I\left({x}_i,{y}_i\right)},\hfill & \forall \left({x}_i,{y}_i\right)\in {\Omega}_{x,y},\hfill \end{array} $$
(6)

where α i is the weight of each pixel in Ω x, y for contributing to the prediction value, and ∑α i  ≡ 1 for all i = 1, 2,…, K. Essentially, Eq. (6) is an anisotropic interpolation model. In our scheme, α i is inversely proportional to the distance between I(x i , y i ) and I(x, y). The prediction error of each embeddable pixel is defined as: σ(x, y) = I(x, y) − I p (x, y).

Inspired by the scheme in [6], in order to apply the different embedding conditions for the regions with different distributions, we estimate the distribution characteristic of each possible embeddable pixel to judge whether it is located in smooth region or complex region. If the variance of the neighboring region Ω x, y of the current pixel I(x, y) is smaller than a pre-determined threshold T, I(x, y) is judged as a smooth pixel. Otherwise, I(x, y) is judged as a complex pixel. For each smooth pixel I(x, y), the embedding level L x, y is calculated by:

$$ {L}_{x,y}= \min \left\{\left\lfloor { \log}_2\left[{\mathrm{JND}}^{\ast}\left(x,y\right)\right]\right\rfloor, \lambda \right\}. $$
(7)

For each complex pixel I(x, y), the embedding level L x, y is calculated by:

$$ {L}_{x,y}= \min \left\{\left\lceil { \log}_2\left[{\mathrm{JND}}^{\ast}\left(x,y\right)\right]\right\rceil, \lambda \right\}, $$
(8)

where ⌊ ⋅ ⌋ and ⌈ ⋅ ⌉ denote the nearest integers no greater than and no smaller than the input, respectively, and these two functions imply that the complex region can tolerate more embedding distortions than the smooth region due to the relative insensitivity of the human eyes.

In order to alleviate degradation, the embedding of secret data is conducted only on the pixels I(x, y), whose absolute values of prediction errors σ(x, y) are smaller than 2Lx, y, see Eq. (9).

$$ {I}_s\left(x,y\right)=\left\{\begin{array}{ll}I\left(x,y\right)+\sigma \left(x,y\right)+\mathrm{sign}\left[\sigma \left(x,y\right)\right]\cdot s,\hfill & \mathrm{if}\kern0.5em \left|\sigma \left(x,y\right)\right|<{2}^{L_{x,y}},\hfill \\ {}I\left(x,y\right)+\mathrm{sign}\left[\sigma \left(x,y\right)\right]\cdot {2}^{L_{x,y}},\hfill & \mathrm{if}\kern0.5em \left|\sigma \left(x,y\right)\right|\ge {2}^{L_{x,y}},\hfill \end{array}\right. $$
(9)

where s∈{0, 1} is the current secret bit to be embedded, and I s (x, y) is the stego pixel. After all the secret bits and the auxiliary information have been embedded, the embedding procedure is finished, and the resulting stego image I s can be transmitted to the receiver side.

3.3 Data extraction and image recovery

To guarantee successful extraction and recovery, the parameters λ and T must be shared by the receiver side. The procedure of data extraction and image recovery is also conducted on the stego image progressively in raster-scanning order, except for the unchanged 8M pixels in the first eight columns. Thus, the same results of JND value, prediction value, and embedding level for the stego pixels can be acquired.

Denote the current stego pixel as I s (x, y). As stated above, the prediction value of I s (x, y) is still I p (x, y). The prediction error of the stego pixel is defined as: σ s (x, y) = I s (x, y) − I p (x, y). If |σ s (x, y)| < 2Lx, y + 1, it means that the current stego pixel I s (x, y) is embedded with one secret bit. Otherwise, the current pixel is only modified without having any secret bit embedded. Therefore, secret bit extraction and image recovery can be achieved by Eqs. (10) and (11), respectively.

$$ \begin{array}{ll}s= \mod \left(\left|{\sigma}_s\left(x,y\right)\right|,2\right),\hfill & \mathrm{subject}\kern0.5em \mathrm{to}\kern0.5em \left|{\sigma}_s\left(x,y\right)\right|<{2}^{L_{x,y}+1},\hfill \end{array} $$
(10)
$$ I\left(x,y\right)=\left\{\begin{array}{ll}{I}_s\left(x,y\right)-\mathrm{sign}\left[{\sigma}_s\left(x,y\right)\right]\cdot \left\lceil \frac{\left|{\sigma}_s\left(x,y\right)\right|}{2}\right\rceil, \hfill & \mathrm{if}\kern0.5em \left|{\sigma}_s\left(x,y\right)\right|<{2}^{L_{x,y}+1},\hfill \\ {}{I}_s\left(x,y\right)-\mathrm{sign}\left[{\sigma}_s\left(x,y\right)\right]\cdot {2}^{L_{x,y}},\hfill & \mathrm{if}\kern0.5em \left|{\sigma}_s\left(x,y\right)\right|\ge {2}^{L_{x,y}+1},\hfill \end{array}\right. $$
(11)

where s is the extracted secret bit from the current stego pixel I s (x, y), and I(x, y) is the recovered pixel value. After all embedded secret bits have been extracted, the auxiliary information bits can be retrieved to further recover the pixels modified in the pre-processing step, and then, the original image can be acquired successfully.

Different with the reported reversible steganographic schemes, the proposed scheme adopts the anisotropic interpolation in the prediction process of cover pixels, which can obtain better prediction accuracy and smaller prediction errors than traditional prediction technique based on the isotropic interpolation. Furthermore, the proposed scheme doesn’t utilize the fixed embedding level to decide whether the cover pixels are embeddable or not. In order to achieve satisfactory performances of the hiding capacity and the stego image quality simultaneously, the embedding level of each cover pixel is adaptively calculated according to its JND value and distribution characteristic. In our scheme, the JND value of each pixel derives from the frequency domain based HVS model, which can be used to guarantee the visual quality of stego image acceptable. Each cover pixel is also classified into smooth pixel or complex pixel by its distribution characteristic based on the neighborhood variance. Since complex region can tolerate more distortions than smooth region, we adjust the embedding levels of complex pixels higher than those of smooth pixels to embed more bits without causing severe distortions. By these strategies, compared with the reported schemes, the proposed scheme can achieve greater hiding capacity under the similar quality of stego image.

4 Experimental results and comparisons

Experiments were conducted on a group of gray-level images to evaluate hiding capacity and visual quality of stego image of the proposed scheme. We used the embedding rate R to represent the pure hiding capacity, as shown in Eq. (12).

$$ R=\frac{L-{L}_a}{M\times N}\left(\mathrm{bpp}\right), $$
(12)

where L is the number of all embedded bits, L a is the number of the compressed auxiliary information bits within all embedded bits, and bpp denotes bits per pixel. Two typical measurement indices, i.e., peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), were used to assess the visual quality of the stego image. The measure of SSIM was developed based on the characteristics of the HVS, which integrated the information of structure, luminance and contrast synthetically for the image quality assessment [14].

The six standard, 512 × 512 images that were used for testing, i.e., Airplane, Barbara, Lake, Mandrill, Sailboat, and Woman, are shown in Fig. 1. The threshold T used for the estimation of pixel distribution was set as 200 in the experiments. Figure 2 shows six stego versions of the images in Fig. 1, and the embedding rates R are 0.8249 bpp, 0.6229 bpp, 0.6430 bpp, 0.4123 bpp, 0.7382 bpp, and 0.9134 bpp, respectively. We can observe from Fig. 2 that the stego images have good visual quality and that the distortions caused by embedding are imperceptible to the human eyes.

Fig. 1
figure 1

Six standard test images

Fig. 2
figure 2

Stego images of the six test images in Fig. 1 (λ = 3). a PSNR = 36.0689 dB, SSIM = 0.9832; b PSNR = 33.4088 dB, SSIM = 0.9820; c PSNR = 33.3851 dB, SSIM = 0.9842; d PSNR = 31.9036 dB, SSIM = 0.9807; e PSNR = 34.5814 dB, SSIM = 0.9824; f PSNR = 37.6211 dB, SSIM = 0.9844

It can be found from Eqs. (7)–(9) that the parameter λ is closely related to the hiding capacity and the degradation of visual quality. Table 1 presents the performances of the proposed scheme with respect to embedding rate R and visual quality after embedding with different λ. We can find that the embedding rate R becomes greater with the increase of λ, but the visual quality of the stego images becomes worse. We also evaluated the relationship between the embedding rate R and the threshold T used for the estimation of pixel distribution. As described in Subsection 3.2, if the variance of the neighboring region of the current pixel I(x, y) is smaller than the threshold T, I(x, y) is judged as a smooth pixel. Otherwise, I(x, y) is judged as a complex pixel. Obviously, smaller threshold T causes that more pixels are judged as the complex pixels. Equations (7) and (8) correspond to the calculation of embedding level for the smooth pixel and the complex pixel, respectively. We can clearly find that, for a given pixel I(x, y), the embedding level L x, y calculated by Eq. (8) is always not smaller than that calculated by Eq. (7), and according to Eq. (9), larger embedding levels make more pixels embeddable. Therefore, smaller threshold T leads to greater embedding rate and lower PSNR value consequently. Figure 3(a) and (b) show the curves of the relationships between the embedding rate R and PSNR value with the threshold T for Airplane (λ = 3), respectively, which is consistent with the result of above analysis.

Table 1 Performances of the proposed scheme with different thresholds λ
Fig. 3
figure 3

The relationships between the embedding rate R and PSNR value with the threshold T

We compared our scheme with two typical schemes that were reported recently, i.e., Tai et al.’s scheme [11] and Jung et al.’s scheme [6]. Figures 4, 5 and 6 show the comparison results for the three schemes, and (a) and (b) in these figures are the corresponding results of embedding rate versus PSNR and SSIM, respectively. Note that the neighboring region for the calculations of prediction value and embedding level in our scheme was selected in the same way as the causal window in [6], and the size is 3. As shown in Figs. 4, 5 and 6, for the same embedding rate, Jung et al.’s scheme [6], which also considered the human visual system, outperforms Tai et al.’s scheme [11] with respect to SSIM; however, it is not obviously superior to Tai et al.’s scheme with respect to PSNR. Due to the appropriate control by the JND values, we can find from the comparison results that the proposed scheme has better performances of both PSNR and SSIM versus embedding rate than the schemes in [6, 11].

Fig. 4
figure 4

Performance comparisons of the proposed scheme and the schemes in [6, 11] for Airplane. a Embedding rate R versus PSNR, b embedding rate R versus SSIM

Fig. 5
figure 5

Performance comparisons of the proposed scheme and the schemes in [6, 11] for Barbara. a Embedding rate R versus PSNR, b embedding rate R versus SSIM

Fig. 6
figure 6

Performance comparisons of the proposed scheme and the schemes in [6, 11] for Lake. a Embedding rate R versus PSNR, b embedding rate R versus SSIM

5 Conclusions

In this work, an adaptive reversible steganographic scheme for high-quality images based on HVS is proposed. The JND value of each embeddable cover pixel is calculated by using the Watson matrix of the IIP model in the DCT frequency domain. By analyzing the characteristics of pixel distribution, the embedding strategy utilizes the relationship between JND values and prediction errors to adaptively decide whether cover pixels can be embedded with secret bits or not. Because the modification of each pixel caused by embedding is controlled by its JND value and the pre-determined threshold, the degradation of visual quality is imperceptible to the human eyes. Compared with other schemes that have been reported recently, the proposed scheme has a greater embedding rate and higher visual quality of the stego image. A future improvement that can be made is to integrate the IDP model into the current scheme for better performance.