1 Introduction

Image steganography is an art of covert communication in which the secret message is hidden inside a cover image [5, 12]. Then, the secret message can be transmitted through an innocuous-looking stego image. Due to the wide application of JPEG images on the Internet, JPEG image steganography has received extensive attentions. In recent years, many JPEG Image steganography algorithms have been proposed, such as JPEG UNIversal WAvelet Relative Distortion (J-UNIWARD) steganography [9], Uniform Embedding Distortion (UED) [8] steganography, and so on. These JPEG steganography algorithms are all content-adaptive and the embedding changes are constrained to the complex texture regions difficult to model. Therefore, they have strong anti-detection capability, however, they are not robust to the lossy image processing such as image compression, image resizing, etc. [24]. So, they are not suitable for some public transmission channel such as WeChat which often apply lossy compression for the stego images. To conduct covert communication using the rich images in social network as covers, the JPEG steganography algorithm must be robust to the lossy image compression. In other words, robust JPEG steganography not only should be able to resist the lossy image operations, but also should has good anti-detection capability. Compared with content-adaptive JPEG steganography, robust JPEG steganography has attracted relatively little attention in the past decades. However, in recent years, the researchers have proposed a series of robust JPEG steganography algorithms [16, 17, 23,24,25,26,27]. Robust JPEG steganography techniques against lossy operations are becoming a research hotspot in the field of information hiding.

For the design of robust JPEG steganographic schemes, Zhang et al. [24] constructed the robust embedding domain based on the relative relationship of inter-block DCT coefficients, and proposed a robust and adaptive JPEG steganography algorithm against JPEG compression; Qian et al. [16] proposed a robust steganography algorithm using texture synthesis, however, the extraction error rate of secret message is relatively high; Zhang et al. [25] also proposed a JPEG compression and detection resistant steganography algorithm based on dither modulation when the quality factor of cover JPEG image is no larger than the quality factor of JPEG compression channel; Tao et al. [17] proposed a robust JPEG steganography by generating the “intermediate image” that is just the stego image after JPEG compression with special quality factor, however, the quality factor of JPEG compression must to be known previously; Zhao et al. [27] proposed a robust JPEG image steganography algorithm based on transmission channel matching, however, the behavior of repeatedly uploading images for recompression is very suspicious; Yu et al. [23] proposed a robust image steganography algorithm based on generalized dither modulation and embedding domain expansion, which can achieve better robustness and anti-detection capability than the method in [25], however, the quality factor of cover JPEG image is also no larger than the quality factor of JPEG compression channel; Zhang et al. [26] proposed a robust steganography algorithm with multiple robustness enhancements, however, the anti-detection capability is still relatively weak.

In this paper, a robust JPEG steganography algorithm is proposed based on DCT and SVD in nonsubsampled shearlet domain [6, 19]. As we know, the low frequency information of the image is more robust for lossy image compression. Therefore, the cover image is firstly decomposed by single scale nonsubsampled shearlet transform (NSST) and the low frequency band is used for message embedding. Furthermore, taking the advantages of DCT and SVD for robust message embedding [1, 11, 18], the low frequency band is divided into 8 × 8 blocks and DCT is performed for each block, and then SVD is performed for a special matrix constructed using the DCT coefficients in low and middle frequency domain of 8 × 8 block. Then, the maximum singular values of all the blocks are used for the construction of robust embedding domain and quantization index modulation (QIM) is used for message embedding and extracting [3]. The binary cover elements can be extracted using QIM for the maximum singular values. The embedding distortion of each cover element is defined according to the texture complexity of the corresponding block and the embedding changes caused by modulating the maximum singular value. To reduce extraction errors, the secret message is encoded by Reed-Solomon (RS) error correcting code. Next, the encoded messages are embedded using syndrome-trellis codes (STCs) [7] which is widely used for minimal distortion steganography and the stego elements are got. Finally, the stego elements are embedded using QIM on the maximum singular values of the corresponding blocks, and the stego image is generated based on the modulated maximum singular values and inverse SVD, DCT and nonsubsampled shearlet transform.

The rest of this paper is organized as follows: Section 2 introduces the construction of robust embedding domain; Section 3 proposes a robust JPEG steganography algorithm and the implementation details are described in details; Section 4 verifies the effectiveness of the proposed steganography algorithm by comparing it with the state-of-the-art robust image steganography algorithms; Section 5 is the conclusion.

2 Robust embedding domain construction

In this section, we firstly introduce the characteristic of image decomposition using NSST, and then the procedure of DCT and SVD for image block is described. Finally, the method for robust element extraction is discussed.

2.1 Image decomposition using NSST

Shearlet transform is an effective tool for image multiscale geometric analysis [6, 22]. It can not only provide a more flexible theoretical tool for the geometric representation of multidimensional data, but also is more natural for implementation. Shearlets exhibit highly directional sensitivity and they are spatially localized and optimally sparse. The NSST is a shift-invariant version of the shearlet transform and eliminates the down-samples and up-samples. It combines the nonsubsampled Laplacian pyramid transform with different combinations of the shearing filters. The highly directional sensitivity of NSST and its optimal approximation properties lead to improvements in many image processing applications. In [19], a blind robust image watermarking approach is proposed based NSST.

In Fig. 1, the original Barbara image and the subband images are shown. The subband images are generated by performing NSST for Barbara image with single scale and four directions. According to Fig. 1, it can be found that the Barbara image is decomposed into a low frequency approximation subband and four high frequency directional subbands. As we know, the JPEG compression will significantly impact the high frequency image characteristics such edge, texture, etc. At the same time, the anti-detection capability can be improved by preserving the image texture characteristics which are difficult to model. Therefore, for robust image steganography, the image can be decomposed by NSST firstly and then the low frequency approximate subband is used for message embedding. In contrast to image steganography in spatial domain which directly modifies the image pixels, the image steganography in NSST domain will have stronger robustness and anti-detection capability.

Fig. 1
figure 1

The NSST on Barbara image: a Original Barbara, b Low frequency approximate subband, c High frequency directional subbands

2.2 Apply DCT and SVD to image block

JPEG is one of the most popular image formats on the Internet because it can achieve good tradeoff between storage size and image quality. The basis for JPEG is the DCT which is a lossy image compression technique. For JPEG compression, the image is performed two-dimensional (2D) DCT on 8 × 8 image blocks, and then the DCT coefficients are quantized according to the quality factor or JPEG quantization table, finally, the DCT blocks are encoded using Huffman encoding.

The 2D DCT is defined by (1) and (2) as follow:

$$ F\left(u,v\right)=\frac{1}{\sqrt{m\times n}}c(u)c(v){\sum}_{x=0}^{M-1}{\sum}_{y=0}^{N-1}f\left(x,y\right)\mathit{\cos}\frac{\left(2x+1\right) u\pi}{2M}\mathit{\cos}\frac{\left(2y+1\right) v\pi}{2N}, $$
(1)
$$ c(u)=\left\{\begin{array}{c}1/\sqrt{2},\kern0.75em u=0\kern0.5em \\ {}1,\kern2.25em u>0\end{array}\right., $$
(2)

where x, u = 0, 1, 2, ⋯, M − 1 and y, v = 0, 1, 2, ⋯, N − 1.

The corresponding inverse DCT is defined by (3) as follow:

$$ f\left(x,y\right)=\frac{2}{\sqrt{m\times n}}{\sum}_{u=0}^{M-1}{\sum}_{v=0}^{N-1}c(u)c(v)F\left(u,v\right)\mathit{\cos}\frac{\left(2x+1\right) u\pi}{2M}\mathit{\cos}\frac{\left(2y+1\right) v\pi}{2N}, $$
(3)

where f(x, y) is the pixel value of image and F(u, v) is the DCT coefficient.

As we know, a DCT block of JPEG image can be separated into low, middle, and high frequency bands as shown in Fig. 2. The DCT coefficients in low and middle frequency band concentrates the most energy of the image. Therefore, they are relatively stable for lossy JPEG compression. On the contrary, the DCT coefficients in high frequency band are easy to remove in lossy compression. Therefore, to get a robust stego image, some DCT coefficients in low and middle frequency bands are used for robust element extraction.

Fig. 2
figure 2

Zig-zag ordering and frequency bands for DCT coefficients

SVD is a kind of orthogonal transforms used for matrix diagonalization. Let A ∈ Rm × n be a m × n matrix. Then, the matrix A can be represented by its SVD as (4),

$$ \mathbf{A}=\mathbf{US}{\mathbf{V}}^T=\left({\mathbf{u}}_{\mathbf{1}},{\mathbf{u}}_{\mathbf{2}},\cdots, {\mathbf{u}}_{\boldsymbol{N}}\right)\left(\begin{array}{cccc}{\lambda}_1& & & \\ {}& \ddots & & \\ {}& & {\lambda}_r& \\ {}& & & 0\end{array}\right)\left(\begin{array}{c}{\mathbf{v}}_1^T\\ {}{\mathbf{v}}_2^T\\ {}\vdots \\ {}{\mathbf{v}}_N^T\end{array}\right)={\sum}_{i=1}^r{\lambda}_i{\mathbf{u}}_i{\mathbf{v}}_i^T, $$
(4)

where U and V are orthogonal M × N and N × M matrices, respectively, and S is a diagonal matrix with nonnegative elements. Diagonal terms λ1, λ2, ⋯, λr of matrix S are singular values of matrix A in a descending order and r is the rank of matrix A.

There are many attractive mathematical properties of SVD, such as the singular values λ1, λ2, ⋯, λr are unique and have good stability. When a small perturbation is added to a matrix, the changes of singular values are very small. Therefore, SVD has been widely used for robust image watermarking techniques [1, 11, 18] and the singular values or singular vectors are often used for watermarking embedding. In this paper, we will employ the maximum singular values of the constructed DCT coefficient matrixes from the 8 × 8 image blocks to generate the robust elements against JPEG compression.

2.3 Robust elements extraction

Let Ic denotes the cover JPEG image and it is decompressed to the spatial domain image Isp. The Isp is decomposed by NSST, and A0 denotes the low frequency subband, Dk, l denotes the high frequency directional subband at scale k and direction l, k = 1, 2, ⋯, K, l = 1, 2, ⋯, L. Let Bi denotes the i-th 8 × 8 block generated by dividing the subband A0 into nonoverlapping blocks, and N is the number of blocks. Then, 2D DCT is performed for each 8 × 8 block and the corresponding DCT coefficient matrix can be got as shown in Fig. 2. For the low and middle frequency DCT coefficients have better stability, we construct a 6 × 6 matrix Mi using the DCT coefficients with numbers 1 to 36.

Then, SVD is performed for each matrix Mi by (5) as follow:

$$ {\mathbf{M}}_i={\sum}_{j=1}^r{\uplambda}_{i,j}{\mathbf{U}}_{\boldsymbol{i},\boldsymbol{j}}{\mathbf{V}}_{\boldsymbol{i},j}^T, $$
(5)

where λi, j denotes the j-th singular value in descending order.

For the maximum singular value λi, 1 of Mi has good robustness against lossy JPEG compression. Therefore, we extract the maximum singular value from each 6 × 6 matrix, then the sequence λ1, 1, λ2, 1, …, λN, 1 is the robust element set which also forms the robust embedding domain.

After the robust elements are extracted, based on QIM, the message bit b can be embedded by (6) as follow:

$$ {\lambda}_{i,1}^{\prime }=\left(\left\lfloor \frac{\lambda_{i,1}}{q}\right\rfloor +\mathit{\operatorname{mod}}\left(\left\lfloor \frac{\lambda_{i,1}}{q}+b\right\rfloor, 2\right)\right)\times q, $$
(6)

where λi, 1 denotes the maximum singular value and q denotes the quantization step.

Correspondingly, the embedded message bit b can be extracted by (7) as follow:

$$ b=\operatorname{mod}\left( round\left(\frac{\lambda_{i,1}^{\prime }}{q}\right),2\right). $$
(7)

To verify the effectiveness of the message embedding method using the robust element set, the Barbara image with 512 × 512 and quality factor (QF) 85 is used for message embedding. According to the above extraction method of robust elements, the number of robust elements is 4096. First, the 4096 bits are embedding according to (6) and the stego image is generated. Then, the JPEG compression is performed for the stego image with QF 65, 75, 85, 95 respectively. Finally, the message bits are extracted according to (7). In Fig. 3, the message extraction error rates are shown.

Fig. 3
figure 3

Message extraction error rates when JPEG compression is performed for the stego Barbara image with QF 65, 75, 85 and 95 respectively

From the extraction error rates shown in Fig. 3, it can be seen that the extraction error rates reduce rapidly with the increment of the quantization step q. Moreover, when the QF of JPEG compression is low, such as QF 65 and 75, the corresponding quantization step q should be larger to extract the correct message bits.

3 Proposed robust JPEG steganography algorithm

In this section, the framework of the proposed robust JPEG steganography algorithm is firstly introduced, and then the implement details of message embedding procedure and extraction procedure are described.

3.1 Framework of proposed steganography algorithm

In Fig. 4, the whole framework of the proposed robust JPEG steganography algorithm is shown, which includes the message embedding procedure and message extracting procedure. This framework employs NSST, DCT and SVD to extract robust elements and then use minimal distortion steganography principle to embed secret message. Moreover, the cover element extraction and the stego element embedding is based on QIM.

Fig. 4
figure 4

Framework of proposed robust JPEG steganography algorithm

For message embedding, the maximum singular value of each constructed DCT coefficient matrix is firstly got based on NSST, DCT and SVD, and then the quantization step q is determined and the corresponding binary cover element is extracted according to QIM; next, the embedding cost of each cover element is computed using the embedding changes and the texture complexity measure, and the messages are encoded by RS code to reduce the extraction errors; then, the encoded messages are embedded by STCs to get the stego elements and QIM is performed for the maximum singular values of all the constructed DCT coefficient matrices to embed the stego elements; finally, all the 8 × 8 blocks are reconstructed using the modulated maximum singular values, and then inverse NSST is performed to get the stego image.

For message extraction, the stego image is firstly decompressed to the spatial domain, and then the decompressed JPEG image is decomposed by single scale NSST to obtain a low frequency subband and several high frequency directional subbands; next, the low frequency subband is divided into nonoverlapping 8 × 8 blocks and DCT is performed for each block; then, the 6 × 6 DCT coefficient matrices are constructed from the 8 × 8 DCT blocks and the corresponding maximum singular values are generated to get the stego elements using QIM; finally, the embedded messages are extracted by STCs and RS decoder is used to get the original secret message.

3.2 Message embedding procedure

3.2.1 Extract cover elements

According to the section 2.3, the cover JPEG image is decomposed by NSST and the low frequency subband is divided into 8 × 8 nonoverlapping blocks, and then the robust element λi, 1 is extracted from each block Bi. To employ the minimal distortion steganography principle to embed the secret message, the binary cover element xi is generated by (8) as follow:

$$ {x}_i=\operatorname{mod}\left( round\left(\frac{\lambda_{i,1}}{q}\right),2\right). $$
(8)

According to (8), we can find that the generation of cover element is same with the extraction of message bit shown in (7). Then, for each block, we can generate a cover element and N cover elements x = (x1, x2, ⋯, xN) are generated in total. These cover elements construct the cover object which is a binary sequence.

3.2.2 Define embedding cost

As we know, the embedding cost of each cover element xi must be defined when the message embedding is performed by STCs which is the widely used for minimal distortion steganography. Here, the texture complexity of the 8 × 8 block Bi and the embedding change of robust element λi, 1 are considered at the same time.

As the cover element xi is extracted from the 8 × 8 block Bi and the corresponding shearlet coefficients of the high-pass subbands Dk, l reflect the texture complexity of image block. Therefore, the texture complexity measure can be defined by (9) as follow:

$$ {t}_i=\frac{1}{\sum_{l=1}^D{\sum}_{m=1}^8{\sum}_{n=1}^8\left|{f}_l\left(i,m,n\right)\right|}, $$
(9)

where fl(i, m, n) denotes the (m, n)-th shearlet coefficient of the corresponding blocks in high-pass subband at direction l, and D denotes the number of high-pass subbands. In (9), the ti is energy of the corresponding shearlet coefficients of the block Bi. The large energy value means the more complex texture structure difficult to model.

Furthermore, as we know, the robust element λi, 1 should be modulated when the cover element xi need to be changed for message embedding. The embedding change of robust element λi, 1 is defined by (10) as follow:

$$ {e}_i=\left|\left(\left\lfloor \frac{\lambda_{i,1}}{q}\right\rfloor +\mathit{\operatorname{mod}}\left(\left\lfloor \frac{\lambda_{i,1}}{q}+\left(1-{x}_i\right)\right\rfloor, 2\right)\right)\times q-{\lambda}_{i,1}\right|. $$
(10)

Finally, the embedding cost di of the cover element xi is defined by (11) as follow:

$$ {d}_i=\frac{\left|{t}_i-{t}_{min}\right|}{\left|{t}_{max}-{t}_{min}\right|}+\frac{\left|{e}_i-{e}_{min}\right|}{\left|{e}_{max}-{e}_{min}\right|}, $$
(11)

where tmax and tmin denote respectively the maximum value and minimal value of the texture complexity measure of all the blocks, emax and emin denote the maximum value and minimal value of embedding changes.

3.2.3 Embed secret message

As shown in Fig. 4, to reduce the extraction error rate, the secret message m is encoded by RS code and then the encoded message is embedded using STCs. For the cover elements are binary bits, the binary embedding operation [7] is performed by (12) and (13) as follows:

$$ \mathbf{y}=\mathrm{Emb}\left(\mathbf{x},\mathbf{m}\right)=\arg \underset{y\in C\left(\boldsymbol{m}\right)}{\min }D\left(\mathbf{x},\mathbf{y}\right) $$
(12)
$$ \mathbf{m}=\mathrm{Ext}\left(\mathbf{y}\right)=\mathbf{Hy} $$
(13)

where D(x, y) is embedding distortion function, H ∈ {0, 1}m × n is a parity-check matrix of the code C, C(m) = {z ∈ {0, 1}n| Hz = m} is the coset corresponding to syndrome m, \( D\left(\mathrm{x},\mathrm{y}\right)={\sum}_{i=1}^n{\rho}_i\left|{x}_i-{y}_i\right| \) is embedding distort function, and all operations are in binary arithmetic.

In other words, binary STCs [7] is used for secret message embedding. After messages embedding by binary STCs, we can get the binary stego element set y= (y1, y2, …, yn).

3.2.4 Embed stego element based on QIM

As shown in Fig. 4, the binary stego element yi should be embedded by modulating the robust element λi, 1 according to (14) as follow:

$$ {\lambda}_{i,1}^{\prime }=\left(\left\lfloor \frac{\lambda_{i,1}}{q}\right\rfloor +\mathit{\operatorname{mod}}\left(\left\lfloor \frac{\lambda_{i,1}}{q}+{y}_i\right\rfloor, 2\right)\right)\times q, $$
(14)

where the \( {\lambda}_{i,1}^{\prime } \) denotes the maximum singular value after modulation.

3.2.5 Generate stego image

Based on the maximum singular value \( {\lambda}_{i,1}^{\prime } \), we firstly reconstructed the Mi according to (5), and then then inverse DCT is performed on each 8 × 8 block. Finally, inverse NSST is performed and the stego image is generated.

That is to say, the robust element \( {\lambda}_{i,1}^{\prime } \) carries the stego element. For the robust elements can resist lossy JPEG compression, it is more likely to get the correct maximum singular values after the stego image is compressed. Then, the corresponding correct stego elements can be got and the correct secret message will be extracted using STCs.

3.2.6 Determine quantization step q

For the above procedure for message embedding, the quantization step q is an important parameter. In (14), the stego element yi is embedded by modulating the maximum singular value λi, 1, and the quantization step q determines the steganography robustness. The large quantization step q means the good robustness, however, the corresponding embedding change is also large and the anti-detection capability will be weakened. Therefore, it is necessary to determine an appropriate quantization step to achieve the balance between steganography robustness and detection resistance.

For a cover image, the determination of quantization step q is shown in Fig. 5. First, the threshold of error rate TER is given according to the robustness requirements for steganography. Then, an initial value of quantization step q is set and the stego image is generated. Finally, the embedded messages are extracted from the stego image which has been compressed, and the error rate ER is counted. If ER < TER, then the current q is the final quantization step, otherwise the q is increased and the above process continues until ER < TER is satisfied. Similar to Fig. 3, the message extraction error rate will reduce rapidly with the increment of quantization step q.

Fig. 5
figure 5

procedure for determining quantization step q

According to the Fig. 5, it can be seen that the quantization step q is determined through an iterative process. In other words, for a cover image, the appropriate quantization step q can be determined by performing secret message embedding and extracting iteratively and increasing the q with a fixed step size until the error rate of the extracted secret message equal to zero or satisfactory. Therefore, this process is relatively time-consuming. In addition, it should be noticed that the appropriate quantization step q is changeable with cover image.

All above, the embedding process of the secret message is described in Algorithm 1 in details.

figure a

According to the above embedding process, it can be seen that the time consumption of the Algorithm 1 mainly includes cover image decomposition using NSST, the DCT on the low frequency subbands, SVD on the 6 × 6 matrices, the message embedding by STCs and the operations for image reconstruction to get the stego image. Therefore, the time consumption is relatively larger than the algorithm in [26].

3.3 Message extracting procedure

As shown in Fig. 4, for message extraction, the stego object y = (y1, y2, ⋯, yN) should be extracted firstly and then the embedded messages can be extracted by STCs and RS decoder. According to (14), the following laws can be found,

if \( \left\lfloor \frac{\lambda_{i,1}}{q}\right\rfloor \) \( \left\lfloor \frac{\uplambda_{\mathrm{k},1}}{\mathrm{q}}\right\rfloor \)is odd number and yi = 0, then \( \left(\left\lfloor \frac{\lambda_{i,1}}{q}\right\rfloor +\mathit{\operatorname{mod}}\left(\left\lfloor \frac{\lambda_{i,1}}{q}+0\right\rfloor, 2\right)\right) \) is even number;

if \( \left\lfloor \frac{\lambda_{i,1}}{q}\right\rfloor \) is odd number and yi = 1, then \( \left(\left\lfloor \frac{\lambda_{i,1}}{q}\right\rfloor +\mathit{\operatorname{mod}}\left(\left\lfloor \frac{\lambda_{i,1}}{q}+1\right\rfloor, 2\right)\right) \) is odd number;

if \( \left\lfloor \frac{\lambda_{i,1}}{q}\right\rfloor \) is even number and yi = 0, then \( \left(\left\lfloor \frac{\lambda_{i,1}}{q}\right\rfloor +\mathit{\operatorname{mod}}\left(\left\lfloor \frac{\lambda_{i,1}}{q}+0\right\rfloor, 2\right)\right) \) is even number;

if \( \left\lfloor \frac{\lambda_{i,1}}{q}\right\rfloor \) is even number and yi = 1, then \( \left(\left\lfloor \frac{\lambda_{i,1}}{q}\right\rfloor +\mathit{\operatorname{mod}}\left(\left\lfloor \frac{\lambda_{i,1}}{q}+1\right\rfloor, 2\right)\right) \) is odd number.

In other words, the maximum singular value \( {\lambda}_{i,1}^{\prime } \) is an even number when the embedded message bit is 0 while \( {\lambda}_{i,1}^{\prime } \) is an old number when the embedded message bit is 1. Therefore, the stego elements can be extracted by (15) as follow:

$$ {y}_i=\mathit{\operatorname{mod}}\left( round\left({\lambda}_{i,1}^{\prime \prime }/q\right),2\right). $$
(15)

where the \( {\lambda}_{i,1}^{\prime \prime } \) is the maximum singular value extracted from the compressed JPEG stego image and the value of quantization step q is same to (14).

The detailed extraction process of secret message is described in Algorithm 2.

figure b

4 Experimental results and analysis

4.1 Experimental settings

In the experiments, 10,000 grayscale images from BOSSbase1.01 [2] were used to evaluate the robustness and anti-detection capability of the proposed JPEG steganography algorithm. The image size is 512 × 512 and image format is PGM. First, the robustness of the proposed steganography algorithm against JPEG compression attack is compared with the other robust image steganography algorithms [26]. Second, the anti-detection capability of the proposed steganography algorithm is evaluated by two typical steganalysis features [10, 15] and ensemble classifier [13]. Finally, the robustness of the proposed steganography algorithm against WeChat compression is presented. The parameter of RS code is (31, 19).

4.2 Robustness against JPEG compression

In this experiment, the 10,000 grayscale images with PGM format are performed JPEG compression with QF 85 to generate the cover images. Then, the corresponding stego images are generated by the proposed steganography algorithm, MREAS-PS and MREAS-PJ [26] respectively. The single scale NSST is performed for each cover image and the number of directions is 16. The quantization step q of the proposed steganography algorithm is determined according to Fig. 5 where the extraction error rate threshold TER is set to 0, qmax = 80 and the lossy channel is JPEG compression with QF 65. For size of cover image is 512 × 512, the maximum embedding payload is 4096 bits. Moreover, for robust steganography, the anti-detection capability is more important than embedding payload. Therefore, the payload is set to 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009 and 0.01 bpnzAC (bit per non-zero AC DCT coefficient) respectively. Then, for each robust steganography algorithm, we have one group of cover images and ten groups of stego images. Next, the stego images with different payloads are performed JPEG compression with QF 65, 75, 85 and 95 respectively. The embedding message is extracted from the compressed JPEG stego image according to Algorithm 2. The average extraction error rates of the different robust steganography algorithms are shown in Table 1.

Table 1 Average extraction error rates of three robust steganography algorithms for all images in BOSSbase1.01. (×10−3)

According to the average extraction error rates shown in Table 1, it can be seen that the proposed steganography algorithm has achieve the competitive robustness. As shown in Table 1, for MREAS-PS and MREAS-PJ, the average extraction error rates are low when the QFs of JPEG recompression are 85 and 95. However, the extraction error rates become high when the QF of JPEG recompression is 65 which means strong attack. In addition, in this experiment, the QFs of JPEG compression attack are 65, 75, 85 and 95 respectively and the QF 65 means the strongest compression attack. Then, the quantization step q for each image is also determined when the QF of compression attack is assumed to be 65. In other words, we do not choose the quantization step q for QFs 75, 85 and 95. We want to know the changes of the extraction error rates when the real QF of JPEG compression attack is not consistent with the assumed value. From the experimental results in Table 1, it can be seen that the extraction error rate is lowest when QF is 65. This is because the real QF of JPEG compression attack is consistent with the assumed value. For other QFs, we can find that the extraction error rates will decrease with the increasement of QFs and this is because that the higher QF means weaker compression attack.

As we know, the complex images often have strong resistance to the detection. Therefore, we should select some images with complex texture structure as the cover images. Here, we will evaluate the steganography robustness of the complex images. First, the images are decomposed by NNST with single scale and 16 directions and the most complex 2000 images are selected according to the energy values of shearlet coefficients from all the high frequency subbands. Then, the 2000 image are used as cover images and the corresponding stego images are generated. Finally, the stego images are attacked by JPEG recompression and the average extraction error rates are shown in Table 2.

Table 2 Average extraction error rates of three robust steganography algorithms for complex images in BOSSbase1.01. (×10−3)

According to the extraction errors shown in Table 2, we can find that the proposed steganography algorithm also can achieve the competitive robustness for the complex images. Moreover, compared with results in Table 1, the extraction error rates increased slightly. This is because that the JPEG recompression has larger impact for the complex images.

4.3 Detection resistance against steganalysis

In this experiment, the anti-detection capability of the proposed robust steganography algorithm is compared with the MREAS-PS and MREAS-PJ [26]. The steganalysis features are CC-PEV [15] with 548 dimensions and DCTR [10] with 8000 dimensions. The ensemble classifier [13] is trained by the steganalysis feature and used as the detector. The ratio of training images and test images is 0.5:0.5. The detection accuracy is quantified using the minimal total error probability under equal priors \( {P}_E={\mathit{\min}}_{P_{\mathrm{FA}}}\left({P}_{\mathrm{FA}}+{P}_{\mathrm{MD}}\right)/2 \), where PFA denotes the false-alarm probabilities and PMD denotes the missed-detection probabilities. The value of \( {\bar{P}}_E \) is averaged over ten random image database splits.

In Fig. 6, the detection error rate \( {\bar{P}}_E \) are given for all the cover images and the corresponding stego images. According to the detection performances, it can be seen that the proposed steganography has stronger detection resistance than MREAS-PS and MREAS-PJ. On the one hand, this is because that the robust embedding domain is constructed in the low frequency subband and the image high-frequency features such as texture, edge have been preserved. On the other hand, the quantization step q of each cover image is determined according to the threshold of extraction error rate, therefore the corresponding embedding changes caused by the quantization operation is relatively small than the embedding changes when a large quantization step q is fixed for all the cover images.

Fig. 6
figure 6

Comparisons of detection error rates \( {\bar{P}}_E \) of two steganalysis features for stego images generated from all images in BOSSbase1.01. a CC-PEV feature, b DCTR feature

In Fig. 7, the detect errors are presented for the most complex 2000 images in BOSSbase1.01. The image complexity is also measured using energy value of shearlet coefficients in all the high frequency subbands. According to the experimental results shown in Figs. 6 and 7, it can be seen that the anti-detection capability of the 2000 complex images is obviously stronger than the anti-detection capability of all the images in BOSSbase1.01. Moreover, for complex images, the proposed steganography also has stronger detection resistance than MREAS-PS and MREAS-PJ.

Fig. 7
figure 7

Comparisons of detection error rates \( {\bar{P}}_E \) of two steganalysis features for stego image generated from 2000 complex images in BOSSbase1.01. a CC-PEV feature, b DCTR feature

In Fig. 8, the detect errors are presented for the simplest 2000 images from BOSSbase1.01. It can be seen that the anti-detection capability of the simple images is obviously weaker in contrast to the complex images. Therefore, we should select the image with complex texture structure as the cover image for convert communication. Moreover, for simple images, the proposed algorithm also has stronger detection resistance than MREAS-PS and MREAS-PJ.

Fig. 8
figure 8

Comparisons of detection error rates \( {\bar{P}}_E \) of two steganalysis features for stego image generated from 2000 simple images in BOSSbase1.01. a CC-PEV feature, b DCTR feature

4.4 Application in WeChat platform

In this experiment, the robustness of the proposed steganography algorithm against WeChat compression is test. As we know, WeChat is a widely used instant communication tool which allows the users to share their images. However, to reduce transmission and storage costs, the uploaded image will be compressed. In particular, the compression algorithm of WeChat is unknown for users. We select two cover images from BOSSbase1.01 which are converted to JPEG images with QF85 and are shown in Fig. 9.

Fig. 9
figure 9

Cover image for covert communication by WeChat. a ‘8.jpg’ in BOSSbase1.01 and b ‘4226.jpg’

First, the stego images are generated with different quantization q and the payload is 0.01 bpnzAC. Then, the stego images are posted on WeChat moment. Finally, the stego images are downloaded from the WeChat moment and the embedded messages are extracted from the downloaded stego image. The extraction error rates are shown in Fig. 10.

Fig. 10
figure 10

Extraction error rate of the stego images after WeChat channel compression

In Fig. 10, it can be seen that the steg images can resist WeChat compression channel by selection appropriate quantization step q. It should be noticed that large quantization step means good robustness while the corresponding quantization operations will bring the large embedding changes for cover image. Moreover, for different cover images, the quantization steps for correct message extraction are also different. Therefore, we need to select the appropriate quantization step q for different cover images.

5 Conclusions and future work

Robust image steganography is an important technique for covert communication by lossy public channels. In this paper, a robust JPEG steganography algorithm is proposed based on DCT and SVD in NSST domain. The experimental results show the proposed algorithm can achieve competitive robustness and detection resistance in contrast to the stat-of-the-art techniques. This is because that the maximum singular value has strong stability and QIM can further eliminate some errors. Moreover, embedding changes in low frequency subband can achieve stronger detection resistance because the texture and edge features of stego image can get better maintained. Furthermore, the quantization step is determined for each cover image according to the threshold of extraction error rate, therefore, the stronger anti-detection capability can be achieved in contrast to the fixed quantization step for all the cover images.

In addition, we should notice that the anti-detection capability of the robust image steganography is relatively weak when the detection is performed by the classifier trained by the original cover image and the corresponding stego image. This is because that the embedding changes of robust steganography is larger than the changes of non-robust steganography such as J-UNWARD, UED. In other words, the large embedding changes is used to achieve the robustness. In the future, we will study the construction of robust embedding domain which can led to the stronger robustness and anti-detection capability. Furthermore, the robustness of other steganography method such as Linguistic Steganography [20, 21] and coverless image steganography [4, 14] also need to be studied.