Keywords

1 Introduction

Steganography is a technique for invisible communication. Its purpose is to embed secret messages into digital covers, such as digital images, for covert communication through public communication channels [1]. Conversely, steganalysis is a technique for detecting the presence of hidden messages in cover objects.

Due to the common use of JPEG images in recent years, JPEG image steganography has been proposed one by one, e.g., YASS [2, 3], NPQ [4], DF-US [5], UED [6], UERD [7], J-UNIWARD [8]. Therefore, how to effectively detect the JPEG steganographic algorithms is one of the most urgent practical problems. Currently, researches on steganalysis can be divided into two classes: special steganalysis and universal steganalysis. Special steganalysis [9,10,11] is designed for a specific hiding technique, while universal steganalysis [12,13,14,15,16,17,18,19,20,21] is generally designed for a series of steganographic methods simultaneously. Due to the diversity of the current steganographic techniques, universal steganalysis is more adaptable in practical applications. Accordingly, the universal steganalysis has attracted extensive attention.

The universal steganalysis is based on machine learning and therefore, the key issue is to find distinguishing features that can classify cover images and stego images. This process has two important aspects. The first one is the design of feature extraction. The selected features should react sensitively to the embedding changes but insensitive to the image content. The second one is to propose an effective classifier with low computational complexity. This paper focuses on the first one, namely the design of feature extraction. In terms of feature extraction, it is believed in [13] that the best (most sensitive) features for steganalysis are obtained when they are calculated directly in the embedding domain. Thus, for JPEG images, the features were generally chosen from the quantized discrete cosine transform (DCT) domain for classification in the early study. For example, an effective Markov process (MP) based JPEG steganalysis scheme proposed in [14] utilized both the intrablock and interblock correlations among DCT coefficients; Fridrich et al. extended the 23 DCT features vector [15] to form a 274-dimensional feature vector [16] by merging Markov and DCT features and later, this 274-dimensional feature vector was extended to twice its size by Cartesian calibration [17]; Kodovský et al. extracted a 7850-dimensional feature vector [18] and used a rich model of DCT coefficients to form a 22510-dimensional feature vector [19]. Recently, in addition to extracting features from the DCT domain directly, some new steganalytic methods extracted features from the other domains were also studied. For example, Fridrich extracted a 34671-dimensional feature vector [20] from the spatial domain to attack the JPEG steganographic algorithms. Besides, features can be extracted from the undecimated DCT domain. For example, in [21], Holub et al. introduced a novel feature vector of which features were engineered as first-order statistics of quantized noise residuals obtained from the decompressed JPEG image using 64 kernels of the DCT coefficient matrix (the so-called undecimated DCT). Obviously, the features of these universal steganalyzers above are selected from a single domain, such as the DCT domain, the spatial domain, the undecimated DCT domain.

Based on those existing steganalytic algorithms, a new feature merging method is proposed in this paper. In recent years, though a series of new feature extraction methods have been introduced in the field of steganalysis, the detection accuracy rate can only be increased by 1–2% points or even less compared with those previously proposed methods. In this paper, we firstly propose that those features extracted in different domains can be merged together to form a more powerful steganalyzer, and the experimental results demonstrate the detection accuracy rate can be improved by 3% points or even higher. However, considering about that the feature dimension is so high after feature merging and thus it may bring difficulties to the feature extraction, training and classification, a new feature selection method is also proposed according to some properties introduced in [22]. Our experimental results demonstrate that this new feature selection strategy can not only reduce the dimensionality of the feature vector, but also maintain a high detection accuracy rate.

This paper is organized as follows. In Sect. 2, we present how to merge features extracted from different domains, such as the DCT domain, the spatial domain and the undecimated DCT domain. The new feature selection method is also proposed in Sect. 2. Experiments and results are then given in Sect. 3. Finally, we summarize this paper in Sect. 4.

2 Feature Merging and Feature Selection

2.1 Characteristics of Difference Images in Different Domains

Due to the intrusion characteristics of steganography, some distortion must be introduced to the cover image. In this place, one image randomly selected from BOSSbase ver. 1.01 [23] is exemplified to illustrate the influence of message embedding on the statistical distribution of the JPEG image. First, the image coming from the BOSSbase is compressed with JPEG quality factor (QF) 75, and then used as the cover as shown in Fig. 1(a). The stego image is generated via using the most representative J-UNIWARD JPEG steganographic algorithm [8]. The embedding rate is 0.4 bpnc (bits per non-zero DCT coefficients)and the stego image is shown in Fig. 1(b).

Fig. 1.
figure 1

The cover image and the stego image corresponding to the J-UNIWARD algorithm. (a) The cover image. (b) The stego image with the embedding rate of 0.4 bpnc.

Figure 2(a)–(c) illustrate the difference images between the stego image and the cover image in spatial domain, DCT domain, and undecimated DCT domain, respectively. The white points indicate that in these positions the elements (pixels/coefficients) have been modified, whereas the black points represent in those positions the elements keep untouched in the embedding process. It is observed from Fig. 2 that even if the same steganographic algorithm is applied, the obtained difference images have different statistical distribution characteristics. As we all know, the steganalytic features are extracted to discriminate the difference between the cover and stego images. In general, the features extracted from different domains may complement and reinforce each other. Thus, the detection accuracy rate can be improved via merging features extracted in different domains, such as the DCT domain, the spatial domain, and the undecimated DCT domain.

Fig. 2.
figure 2

The difference images between the cover and stego image obtained in different domains with the J-UNIWARD algorithm. (a) The spatial domain. (b) The DCT domain. (c) The undecimated DCT domain.

2.2 Characteristics of Feature Vector

As introduced in our previous work [22], the difference between cover and stego images should consistently increase with the increase of embedding rate. Some experimental results corresponding to the steganographic scheme J-UNIWARD are illustrated in Fig. 3. The cover image is shown in Figs. 1(a) and 3(a)–(d) show the modifications that have been made by using the J-UNIWARD algorithm with different embedding rates.

Fig. 3.
figure 3

Difference images between the cover and stego images regarding to different embedding rates. (a) The difference image with the embedding rate of 0.1 bpnc. (b) The difference image with the embedding rate of 0.2 bpnc. (c) The difference image with the embedding rate of 0.3 bpnc. (d) The difference image with the embedding rate of 0.4 bpnc.

As seen in Fig. 3, even if embedding rates are different, most of the modifications are made in the same edge areas or complex texture regions. And the difference between cover and stego images will become greater with the increase of embedding rate. As is known, the most basic principle of steganalytic features is to capture the difference between cover and stego images. Via extracting the appropriate features, these two types of images can be classified. In our opinion, if the extracted feature value changes in one direction (consistently decrease or increase) with the increase of embedding rate, this extracted feature should be selected for classification. On the contrary, if the extracted feature presents a randomly decreasing or increasing characteristic, this kind of extracted feature may confuse the classifier and should be excluded from the original feature vector in the steganalytic process. The specific selection method of effective features will be detailed in Sect. 2.3.

2.3 Feature Merging and Feature Selection

Based on the characteristic described in Sects. 2.1 and 2.2, it is obvious that the modifications introduced by embedding messages present different characteristics in different domains and thus steganalysis features in different domains may have different detection ability. The detailed realization of our proposed feature merging method and feature selection method are given in the following.

2.3.1 Merging Features Extracted in Different Domains

Suppose that there are \( A_{t} \left( {t = 1,2, \ldots } \right) \) feature extracted domains. According to our previous analysis, today’s modern steganalytic algorithms generally extract features from one of the domains. Assume that \( F_{t,j} \left( {t = 1,2, \ldots } \right) \) denotes the value of the jth dimensional feature which is extracted from an image in domain \( A_{t} \). \( F_{t} \) which denotes the feature vector extracted from the image in domain \( A_{t} \) is defined as

$$ F_{t} = \left\{ {F_{t,j} |1 \le j \le N_{t} } \right\}, $$
(1)

where the parameter \( N_{t} \) denotes the total number of features extracted from an image in domain \( A_{t} \).

And \( F \) which denotes the new feature vector obtained by merging features extracted in different domains is represented as

$$ F = \left[ {F_{1} \,F_{2} \ldots } \right] . $$
(2)

2.3.2 New Feature Selection Method

Without loss of generality, the merged feature set C extracted from cover image set is defined as

$$ C = \left\{ {C_{i,j} \left| {1 \le i \le M,1 \le j \le N} \right.} \right\}. $$
(3)

And the merged feature set \( S^{\alpha } \) extracted from stego images is defined as

$$ S^{\alpha } = \left\{ {S_{i,j}^{\alpha } \left| {1 \le i \le M,1 \le j \le N} \right.} \right\}. $$
(4)

In Eqs. (3) and (4), M denotes the number of images in the image set, \( N \) denotes the total number of features after merging features extracted from an image in different domains. The parameter \( \alpha \) represents the embedding rate.

Then we can obtain \( P_{j} \) as follows, which denotes mean value of all the jth dimensional features extracted from images in the image set.

$$ P_{j} = \left( {\sum\nolimits_{i = 1}^{i = M} {C_{i,j} } } \right)/M,\quad (1 \le j \le N). $$
(5)

And a new variable is defined as

$$ T_{j}^{\alpha } = \sum\nolimits_{i = 1}^{M} f \left( {S_{i,j}^{\alpha } - P_{j} } \right), $$
(6)

where

$$ f\left( x \right) = \left\{ {\begin{array}{*{20}l} {0,} \hfill & {x \le 0} \hfill \\ {1,} \hfill & {x > 0} \hfill \\ \end{array} } \right.. $$
(7)

According to our previous analysis in Sect. 2.2, if the value of \( T_{j}^{\alpha } \) in Eq. (6) consistently decreases or increases with the increase of embedding rate \( \alpha \), the jth dimensional feature will be selected as effective feature.

Some experimental results corresponding to the steganalysis scheme JRM are shown in Table 1. We randomly select 5000 images from BOSSbase ver. 1.01, which are compressed as the cover images with QF = 75. Then 5000 stego images are created by using the most representative steganographic algorithm J-UNIWARD with different embedding rates. The cover feature set and stego feature set are extracted from cover and stego images using JRM steganalytic algorithm. We calculate \( T_{j}^{\alpha } \) using Eqs. (5) and (6), where the parameters M and N are equal to 5000 and 22500 respectively. The three \( T_{j}^{\alpha } \left( {j = 6,11,19} \right) \) values (i.e., the 6th dimensional feature, the 11th dimensional feature and the 19th dimensional feature) extracted by JRM from 5000 stego images with different embedding rates are shown in Table 1.

Table 1. characteristic of the same feature of difference images with different embedding rates

It is observed from Table 1 that \( T_{6}^{\alpha } \) corresponding to the 6th dimensional feature (\( T_{19}^{\alpha } \) corresponding to the 19th dimensional feature) consistently decrease (increase) with the increase of embedding rate \( \alpha \). Whereas for \( T_{11}^{\alpha } \) corresponding to the 11th dimensional feature, it may decrease or increase randomly with the increase of embedding rate \( \alpha \). According to our previous analysis, these kinds of features (e.g., the 6th dimensional feature and 19th dimensional feature) may be effective and should be selected in the steganalytic process. However, those kinds of features (e.g., the 11th dimensional feature) may confuse the classifier and can be excluded from the original feature set in the steganalytic process.

As a result, if \( T_{j}^{\alpha } \) value consistently decreases or increase with the increase of embedding rate \( \alpha \), this extracted jth dimensional feature may be effective and should be selected. Thus, in our proposed method, the extracted feature may be selected as an effective feature in these two cases as follows. Here, a parameter \( \delta \) is introduced to control the number of selected features.

Case 1: \( T_{j}^{\alpha } \) values consistently decrease with the increase of embedding rate \( \alpha \). The extracted features from the original high dimensional feature set must satisfy the following two conditions.

  1. (1)

    For any given image set to be tested, the stego images are obtained with different embedding rates, i.e., \( \alpha_{1} ,\alpha_{2} , \ldots ,\alpha_{n} \). For \( 0 < \alpha_{1} < \alpha_{2} < \cdots < \alpha_{n} \left( {n = 1,2,3, \ldots } \right) \), if the jth dimensional feature is considered as the effective feature and can be selected for classification, the \( T_{j}^{\alpha } \) values should consistently decrease with the increase of embedding rate \( \alpha \), namely the following inequality (8) must be satisfied, i.e.,

$$ T_{j}^{{\alpha_{1} }} > T_{j}^{{\alpha_{2} }} > \cdots > T_{j}^{{\alpha_{n} }} $$
(8)
  1. (2)

    For any given embedding rate, if the jth dimensional \( \left( {1 \le j \le N} \right) \) feature is effective, the following inequalities must be satisfied to control the number of selected features.

$$ \begin{array}{*{20}c} {0 \le T_{j}^{{\alpha_{1} }} \le M \times \delta } \\ {0 \le T_{j}^{{\alpha_{2} }} \le M \times \delta } \\ \vdots \\ {0 \le T_{j}^{{\alpha_{n} }} \le M \times \delta } \\ \end{array} $$

The parameter \( \delta \) \( (0 < \delta < 1) \) is used to control the number of selected valid classification features. In this paper, we can select \( \delta = \) 0.45–0.50. Generally, the number of effective features may increase with the increase of \( \delta \).

Case 2: \( T_{j}^{\alpha } \) values consistently increase with the increase of embedding rate \( \alpha \). The extracted feature from the original high dimensional feature set must satisfy the following two conditions.

  1. (1)

    For any given image set to be tested, the stego images are obtained with different embedding rates, i.e., \( \alpha_{1} ,\alpha_{2} , \ldots ,\alpha_{n} \). For \( 0 < \alpha_{1} < \alpha_{2} < \cdots < \alpha_{n} \left( {n = 1,2,3, \ldots } \right) \), if the jth dimensional feature is considered as the effective feature and can be selected for classification, the \( T_{j}^{\alpha } \) values should consistently increase with the increase of embedding rate \( \alpha \), namely the following inequality (9) must be satisfied, i.e.,

$$ T_{j}^{{\alpha_{1} }} < T_{j}^{{\alpha_{2} }} < \cdots < T_{j}^{{\alpha_{n} }} $$
(9)
  1. (2)

    For any given embedding rate, if the jth dimensional \( \left( {1 \le j \le N} \right) \) feature is effective, the following inequalities must be satisfied to control the number of selected features.

$$ \begin{array}{*{20}c} {M \times \left( {1 - \delta } \right) \le T_{j}^{{\alpha_{1} }} \le M } \\ {M \times \left( {1 - \delta } \right) \le T_{j}^{{\alpha_{2} }} \le M } \\ \vdots \\ {M \times \left( {1 - \delta } \right) \le T_{j}^{{\alpha_{n} }} \le M } \\ \end{array} $$

Similarly, we can select \( \delta = \) 0.45–0.50.

3 Experimental Results

3.1 Experiment Setup

In this paper, we utilize the BOSSbase ver. 1.01 [23] image data set for all of our experiments. It consists of 10000 gray-scale images with the size 512 × 512, which are compressed as the cover images with QF = 75. The stego images are generated by using the most representative JPEG steganographic algorithm J-UNIWARD with different embedding rates. Four different embedding rates, i.e., 0.1 bpnc, 0.2 bpnc, 0.3 bpnc and 0.4 bpnc, are selected in our testing. The ensemble classifier [18] is used for classification. We randomly select 5000 images for training and the remaining 5000 images are used for testing.

3.2 Experiment #1

In this experiment, algorithm SRM [20] is applied to extract features from JPEG stego images in \( A_{1} \) domain (the spatial domain). The dimension of the SRM feature vector is \( N_{1} \left( {N_{1} = 34671} \right) \). Algorithm JRM [19] is applied to extract features from JPEG stego images in \( A_{2} \) domain (the DCT domain). The dimension of the JRM feature vector is \( N_{2} \left( {N_{2} = 22510} \right) \). Algorithm DCTR [21] is applied to extract features from JPEG stego images in \( A_{3} \) domain (the undecimated DCT domain). The dimension of the DCTR feature vector is \( N_{3} \left( {N_{3} = 8000} \right) \). A new feature vector is obtained by merging features extracted in two or three different domains. The ensemble classifier [18] is used for classifying JPEG cover images and JPEG stego images. The efficiency of our proposed feature merging method is shown in the Table 3. In comparison, the efficiency of the features extracted in a single domain is shown in the Table 2. In this case, three different steganalysis schemes, i.e., SRM, JRM and DCTR and four different embedding rates, i.e., 0.1 bpnc, 0.2 bpnc, 0.3 bpnc, 0.4 bpnc are tested.

Table 2. Features dimension and testing error for four different embedding rates in different single domain

From the Tables 2 and 3, it is obvious that the detection accuracy rate can be improved via merging features extracted from different domains. For example, when the embedding rate is 0.4 bpnc, the testing error of the steganalysis scheme SRM is 0.1988 with the feature dimension of 34671, while the testing error of the steganalysis scheme JRM is 0.2585 with the feature dimension of 22510, and the testing error of the steganalysis scheme DCTR is 0.1504 with the feature dimension of 8000. However, when combines SRM features and JRM features together, the testing error works out to be 0.1667 with the feature dimension of 57181. This indicates that the new feature merging method achieves to a higher classification rate by 3% points or even higher compared to the JPEG steganalytic algorithms SRM and JRM. Furthermore, when combines the SRM features, JRM features and DCTR features simultaneously, the testing error can be decreased to 0.1352 with the feature dimension of 65181. That is to say, its detection accuracy rate can be improved by 2–3% points or even more.

Table 3. Features dimension and testing error of merging features

3.3 Experiment #2

Based on experiment 1 presented in Sect. 3.2, this experiment is to demonstrate the efficiency of our new method for dimensionality reduction, and the results are shown in Table 4. In this experiment, four different embedding rates, i.e., 0.1 bpnc, 0.2 bpnc, 0.3 bpnc, 0.4 bpnc are tested. In the training process, the effective features are selected according to the control parameter \( \delta \) (\( \delta \) is selected as 0.45, 0.46, 0.47, 0.48 or 0.49 in our testing) and a series of classifiers can be obtained. Then these obtained classifiers are used for testing.

Table 4. Features dimension and testing error of effective features

As shown in Table 4, the proposed feature selection method can not only reduce the dimensionality of the merged feature vector, but also maintain a high detection accuracy rate. For example, when \( \delta = 0.49 \) and the embedding rate is 0.4 bpnc, for the merged feature set “SRM + JRM”, the dimension can be reduced from 57181 to 13482. Though the testing error is increased from 0.1667 to 0.1717, the detection accuracy rate is still better than using SRM (the testing error is 0.1988) and JRM (the testing error is 0. 2585) separately. When \( \delta = 0.49 \) and the embedding rate is 0.4 bpnc, for the merged feature set“SRM + JRM + DCTR”, the dimension can be reduced from 65181 to 16518. Though the testing error is increased from 0.1320 to 0.1340, the detection accuracy rate is still better than SRM (the testing error is 0.1988), JRM (the testing error is 0. 2585) or DCTR (the testing error is 0. 1504) separately.

4 Conclusions

In this paper, we propose a new universal JPEG steganalyzer. The contributions of this paper are as follows.

  1. (1)

    A new feature merging method is proposed in this paper. Via merging features extracted from different domains, the detection accuracy rate of those existing JPEG steganalytic algorithms can be improved by 3% points or even higher.

  2. (2)

    Considering about that the feature dimension is so high, a new feature selection method is also proposed in this paper. Experimental results demonstrate that it can not only achieve reduction of the dimensionality, but also maintain a high detection accuracy rate.