1 Introduction

Steganography is a technique for covert communication, which embeds secret messages into ordinary digital media without drawing suspicion, while steganalysis, the counterpart of steganography, mainly aims to judge whether the unknown digital media carrier secret messages. In the decades, researchers have proposed many effective steganalysis algorithms, where the steganalysis framework based on the feature vector and the classifier has become the mainstream of steganalysis. The prevailing steganalysis algorithms analyze the effects of steganography on image statistics to construct steganalysis features, such as SPAM [27] and SRM series [9,10,11, 32]. The features of labeled images are used to train the classifiers, such as the support vector machine SVM [7] and the ensemble classifier [19], which are applied to steganalysis in practice.

However, a common issue to most steganalysis tools is that the investigated images are supposed to be from a single type of images directly, that is, the covers are of the same type and the stegos are generated directly by embedding secret messages into the covers. Moreover, the performances of these steganalysis tools are usually evaluated on the image databases with only a few image types. While in the real world, there are thousands of image types due to the rapid development of image processing technology. It is very easy to deal with digital images for a certain application [12]. Images on the web pages, received by communication tools and in the social network, may have undergone various image manipulations. As all these images can be taken as the covers for steganography, it is key to deal with the heterogeneous images for the application of steganalysis tools in practice. In fact, the statistical distributions of covers may be quite different from that of natural images. Although most existing steganalysis methods achieve excellent results in the certain experimental settings, they may suffer the problem of cover source mismatch when the training set does not contain the images of the same type as the testing images. While it is fallacious to try to train the classifier on a large heterogeneous data set of mixed sources [17], the detection results are not reliable in the real world due to the complicated case where the image may have undergone various image manipulations before information embedding.

There are two possibilities that image manipulations affect the detection performances of steganalysis tools in the real world. On the one hand, image manipulations may change the statistical distributions of cover images, which makes it hard to distinguish a stego from the normally processed images. The steganalysis tools may judge the processed covers as stegos, resulting in a high false alarm rate. While in the real world, the stegos are relatively hardly observed, and high false alarm rates could make the steganalysis system collapse due to a large number of misjudged covers. Therefore, real-world steganalysis should be required to have very low false alarm rates. On the other hand, steganography on the processed images may make the stego statistics similar to the cover images statistics, resulting in a high missed detection rate of steganalysis. Besides. to avoid the problem of cover source mismatch, the training set should cover the types of testing images. However, enlarging the training set and increasing the types of training data may reduce detection accuracies. Besides, due to the huge number of image types, the training set could hardly contain all image types. Hence, most of the steganalysis tools are hardly directly applicable in the real world.

To improve the reliability of steganalysis in the real world, some methods are proposed to deal with heterogeneous images. In [13], He et al. selected the characteristic function moments of the image and its wavelet subbands as features to classify the natural, the stego and the sharpened images. The method aims to reduce false alarms by differentiating stego images and processed images. Considering that a cover may have undergone image manipulations before information embedding, another scheme [1, 15, 21] is proposed for steganalysis on images with different types separately. In [1], Barni et al. used the forensics tools to aid the steganalysis of heterogeneous images. They firstly differentiated camera images and computer-generated images, and then used the steganalyzer explicitly trained to work with images belonging to the correct class. Li et al. [21] applied the image pre-classification to cluster the image into different classes. For each cluster, the steganalyzer is trained separately. However, the image type in each cluster is unknown, while with the knowledge of the image type, a higher accuracy of steganalysis may be achieved. In [15], different tools were selected for steganalysis based on the knowledge of whether the investigated bitmap images have undergone the JPEG compression or not. Furthermore, with the knowledge of quality factors, a much more reliable detection accuracy could be achieved.

Fragile detection of image manipulation is one kind of image forensics technology which could identify the last applied image operation, while failing to detect the targeted operation if it is followed by another operation, such as steganography. In this paper, we proposed a steganalysis framework based on the combination of the image forensics and the steganalysis tools to attenuate the problem that normally processed images are judged as stegos. Firstly, the normally processed images are separated from the investigated images by fragile detection of image manipulations. Then the steganalysis is conducted on the unlabeled images. The identified normally processed images are judged as covers, which reduces the false alarm rates of steganalysis. Different from existing methods based on the image multi-classification, any effective fragile detection of image manipulations can be applied to the proposed steganalysis framework, ensuring the extensibility of the framework. As image gamma transformation is one of the most frequently-used operations for image contrast changing, it is considered to be a practical tool as an assistant of steganography [31]. The image manipulation gamma transformation, and two steganographic schemes, LSB matching and S-UNIWARD [14], are conducted in the experiments, which validates the effectiveness of the proposed for improving the reliability of steganalysis in the real world. The experimental results show that the steganalysis false alarm rates of the proposed framework is smaller than that of the steganalysis without image forensics where the classifiers are trained on both original covers and heterogeneous images respectively.

2 Steganalysis error probability

Generally, the performance of a steganalyzer is assessed by the average detection error of the covers and stegos. Let C and S note the covers and stegos respectively, NC and NS present their numbers, and N stand for the total image number, i.e., N = NC + NS. Then the steganalysis error probability is

$$ {P_{E}} = \frac{{{N_{C}}}}{N} {P_{FA}} + \frac{{{N_{S}}}}{N} {P_{MD}} $$
(1)

where PFA and PMD are the false alarm rate and missed detection rate, which are defined as

$$ \left\{ \begin{array}{l} {P_{FA}} = \frac{{{N_{C}^{S}}}}{{{N_{C}}}}\\ {P_{MD}} = \frac{{{N_{S}^{C}}}}{{{N_{S}}}} \end{array} \right. $$
(2)

where \({N_{C}^{S}}\) and \({N_{S}^{C}}\) are the numbers of the misjudged covers and stegos respectively.

At present, most of the prevailing steganalysis schemes use machine learning method where steganalytic features are fed to the classifier. Figure 1 gives the steganalysis framework where the classifier is trained by the features of covers and stegos, and then the trained classifier is used to detect the unknown images. For the sake of clarity and comparison, the steganalysis framework is called as the traditional steganalysis framework in this paper. The construction of steganalysis features is motivated to capture the changes on the image by steganography and the steganalysis error comes from two parts, namely, the false alarms that natural images are judged as stegos and the missed detections that stegos are judged as covers.

Fig. 1
figure 1

Steganalysis framework in the traditional mode

While the testing image set consists of heterogeneous images, the false alarm rate can be represented as the sum of the detection errors of all image classes. Assume there are m kinds of images which have undergone various operations. Let C0 represent the original image, Ci represent the image processed by the i-th operation, where 1 ≤ im. Then the false alarm rate could be rewritten as

$$ \begin{array}{@{}rcl@{}} {P_{FA}}& = & \sum\limits_{i=0}^{m}{\frac{{N_{{C_{i}}}^{S}}}{{{N_{C}}}} }\\ &= & \sum\limits_{i=0}^{m}{\frac{N_{C_{i}}}{N_{C}} \cdot \frac{{N_{{C_{i}}}^{S}}}{N_{C_{i}}}} \end{array} $$
(3)

For the sake of brevity and readability, we assume the ratio of the covers number and the stegos number are fixed. In this way, we could focus on the analysis of the false alarm rate. Assume that m = 1, namely, the covers are consists of two kinds of images: the original images and the images processed by an operation. In this case, the false alarm is

$$ {\tilde P_{FA}} = \frac{N_{C_{0}}}{N_{C}} \cdot \frac{N_{C_{0}}^{S}}{N_{C_{0}}} + \frac{N_{C_{1}}}{N_{C}} \cdot \frac{N_{C_{1}}^{S}}{N_{C_{1}}} $$
(4)

Note that if the covers are all original images, then the false alarm rate is

$$ {\bar P_{FA}} = \frac{N_{C_{0}}^{S}}{N_{C_{0}}} $$
(5)

Then

$$ {\tilde P_{FA}} = {\bar P_{FA}} + \frac{N_{C_{1}}}{N_{C}}\left( \frac{N_{C_{1}}^{S}}{N_{C_{1}}} - \frac{N_{{C_{0}}}^{S}}{N_{C_{0}}} \right) $$
(6)

When \(\frac {N_{C_{1}}^{S}}{N_{C_{1}}} > \frac {N_{C_{0}}^{S}}{N_{C_{0}}}\), \({\tilde P_{FA}} > {\bar P_{FA}}\), which means that if the detection error probability of C1 is larger than that of the natural images, then the average false alarm rate of steganalysis on heterogeneous images would increase. Moreover, the more images C1 are in the heterogeneous images, the larger is the false alarm rate. The similar conclusion could be made in the cases where m > 1. Specifically, when \(\frac {{N_{{C_{i}}}^{S}}}{N_{C_{i}}} = 1\), 1 ≤ im, the false alarm rate reaches the maximum that

$$ \begin{array}{@{}rcl@{}} P_{FA}^{\max } &= & \sum\limits_{i=0}^{m}{\frac{N_{C_{i}}}{N_{C}} \cdot \frac{{N_{{C_{i}}}^{S}}}{N_{C_{i}}}} = \frac{N_{C_{0}}}{N_{C}} \cdot \frac{N_{C_{0}}^{S}}{N_{C_{0}}} + \sum\limits_{i=1}^{m}{\frac{N_{C_{i}}}{N_{C}} \cdot \frac{{N_{{C_{i}}}^{S}}}{N_{C_{i}}}} \\ &= & \frac{N_{{C_{0}}}}{N_{C}} \cdot \frac{N_{C_{0}}^{S}}{N_{C_{0}}} + \sum\limits_{i=1}^{m}{\frac{N_{{C_{i}}}}{N_{C}}} = \frac{N_{{C_{0}}}}{N_{C}} \cdot \frac{N_{C_{0}}^{S}}{N_{C_{0}}} + 1 - \frac{N_{{C_{0}}}}{N_{C}}\\ &= & 1 - \frac{N_{C_{0}}}{N_{C}}\left( {1 - \frac{N_{C_{0}}^{S}}{N_{C_{0}}}} \right) \end{array} $$
(7)

Steganography could be regarded as a special image operation that adds a few noises to images, and the image visual effect is hardly altered. While the normal image operations, such as contrast enhancement, usually change the visual effect for certain applications, they may introduce a lot of noises. When the normally processed images are fed to a steganalysis classifier which is trained to discriminate original images and stegos, they may be judged as stegos with high probabilities. As there is a great number of processed images in the real world, the false alarm rates of traditional steganalysis schemes would be very high in practice applications.

3 Proposed steganalysis framework

An intuitive idea to remove the effects of image operations on the steganalysis performance in the real world is to separate the processed images from the investigated images before steganalysis. In this paper, we take advantage of image forensics.

Image manipulation identification is an image forensics technology which could judge whether an image has undergone the specific operation. At present, the research of image manipulation identification technology has made a lot of achievements, including the detection of median filtering [18, 26, 42], contrast changing [5, 29], blurring [4, 22, 44], rescaling [3] and JPEG compression [2, 16, 38]. Most of the existing image manipulation detection tools are capable of detecting the last operation applied to the investigated image even if the image has undergone various operations. However, if the targeted operation is followed by another operation, some forensics tool may fail. As the counterpart of robust detection, the detection of the targeted operation is called as fragile detection to the post-operation which could make the forensics tool fail.

While the fragility of an image manipulation detection may be taken as a weakness in image forensics [33, 34], it could be exploited to assist the steganalysis in the real world. Consider the case where the investigated images include original images, normally processed images and stegos. Note that the stego may have undergone image operations before information embedding. If an image is a stego, then the fragile detection of image manipulations would not judge the stego as the normally processed image due to its fragility to steganography. Moreover, if an image is a normally processed image, then it does not carry any secret message. Therefore, the false alarm rate of steganalysis could be reduced by the fragile detection of image manipulations applied before the steganalyzer. Figure 2 gives the proposed steganalysis framework based on the fragile detection of image manipulations. Firstly, the fragile forensics tools are applied to separate the normally processed images from the investigated images, and then the remaining images are steganalyzed.

Fig. 2
figure 2

Steganalysis framework based on the combination of the image forensics and the steganalysis

It is worth noting that the proposed framework is different from the one proposed in [13] which uses the characteristic function moments for multiclassification. In this paper, the specific images are separated by the corresponding fragile forensics tools before steganalysis. Any fragile forensics tools for image manipulations detection could be applied to the proposed framework. Thus, it possesses the feature of extensibility.

4 Steganalysis aidded by gamma transformation detection

In this section, we are considering the case where gamma transformation is involved. Firstly, the application of gamma transformation in the steganography [31] is reviewed. Then we construct a new feature for gamma transformation detection. Finally, a steganalysis scheme aided by gamma transformation detection is presented.

4.1 Application of gamma transformation in steganography

Image gamma transformation is an operation for image contrast changing, whose basics form is s = rγ, where r ∈ [0,1] is the input and γ > 0 is the only decisive parameter which controls the direction and intensity of the transformation. Generally, due to the limited storage space, the pixel values of the digital image need to be truncated. In this paper, we consider the 8-bit images, which is widely used as covers in steganography. Thus, the form of image gamma transformation is

$$ y = {\text{round}}\left( {{{255}} \times {{\left( {\frac{x}{{255}}} \right)}^{\gamma} }} \right) $$
(8)

where x,y ∈ {n|n ∈ [0,255] ∩Z} represent image pixel values before and after gamma transformation respectively, and round(⋅) is the rounding operation.

As widely used in image processing, gamma transformation is considered to be a practical tool as an assistant of steganography. In [31] (in Chinese with English abstract), Sun et al. analyzed the deviation of an image statistical feature and pointed out that the gamma transformed images tend to be judged as stegos in steganalysis. Based on the conclusion, they proposed a steganography scheme that embeds information into gamma transformed images. Firstly, the image is gamma transformed with parameter γ = 1 + Δ, where Δ is the disturbance factor of the gamma transformation parameter. Then the secret message is embedded into the gamma transformed images. In this way, the normal gamma transformed images will be judged as the stegos, resulting in a high false alarm rate.

4.2 Fragile detection of gamma transformation

If the gamma transformed images without information embedding could be separated from the investigated images, then the steganalysis performance would be improved. The idea can be achieved by fragile detection of gamma transformation as it is effective for the detection of the last applied gamma transformation, while it will fail if the gamma transformed images have undergone steganography at last.

For gamma transformation detection, Stamm and Liu [30] exploited the high-frequency coefficients of image histogram characteristics function. They pointed out that after gamma transformation, the high-frequency coefficients will be enlarged. Therefore, they constructed a feature by averaging the high-frequency coefficients to detect gamma transformation. Cao et al. [6] proposed a gamma transformation detection method based on the number of image histogram gaps. They observed that after gamma transformation, many gaps emerge that their values are zero and their adjacent histogram bins are nonzero. In our prior works [35, 36], we analyzed effects of gamma transformation on the image histogram and pointed out that gamma transformation introduces zero-value histogram bins whose locations are closely related to the gamma transformation parameter. Based on the conclusion, we proposed a manipulation detector and a parameter estimator for image gamma transformation.

The methods above have more or less taken advantage of the zero-value histogram bins introduced by gamma transformation. While after steganography, the zero-value histogram bins may be filled, therefore these methods are fragile to steganography. Based on this feature, the zero-value histogram bins could be exploited to separate the gamma transformed images without information embedding from stegos, and it need not consider whether the stegos have undergone gamma transformation.

More specifically, we assume that the steganography applies ± K operation to pixels to embed the information, such as EA [23], HUGO [28], S-UNIWARD [14], and HILL [20] etc. Then the relationship between the image histograms before and after steganography could represented by the following equation

$$ {h_{s}}\left( n \right) = \sum\limits_{k}{\alpha_{{n-k},k}}{h_{c}}\left( {k} \right) $$
(9)

where hs (n) and hc (n) are the histogram bins of the stego and the cover at n respectively, αi,j ≥ 0 and \(\sum \limits _{i}{\alpha _{i,j}}=1\). As to the steganography LSB matching with the payload ρ, α− 1,j = α1,j = 0.25ρ and α0,j = 1 − 0.5ρ. Therefore, the zero-value histogram bin will be filled due to the shares of its adjacent nonzero histogram bins. Figure 3 shows the histograms of the typical image Lena with size 512 × 512 under gamma transformation and LSB matching. The parameter of gamma transformation is γ = 0.67 and the payload of LSB matching is ρ = 0.4. Figure 3a and b are the histograms of the original image and the gamma transformed image, and Fig. 3c and d are the histograms of the stegos which are generated by embedding information into the original image and the gamma transformed image respectively. It is showed that only the histogram of gamma transformed image in Fig. 3b owns many zero-value histogram bins.

Fig. 3
figure 3

Histograms of image Lena. a original; b gamma transformed with γ = 0.67; c LSB matching on (a) with ρ = 0.4; d LSB matching on (b) with ρ = 0.4

However, the natural images and stegos may also have zero-value histogram bins or unsmooth histogram envelopes. Therefore, the existing gamma transformation detection will judge these images as the gamma transformed, which led to the missed detection of steganalysis. Considering the issue, we constructed a new feature based on the zero-value histogram bins and their adjacent histogram bins values. It is observed that in the histograms of natural images and stegos, the values of histogram bins adjacent to the zero-value histogram bins are usually small, as shown in Fig. 3a, c and d, while in the histogram of the gamma transformed images without information embedding they are relatively large, Fig. 3b. Besides, the numbers of zero-value histogram bins of the gamma transformed images without information embedding are usually larger than that of the natural images and stegos. Based on the observation, we multiply the two histogram bins adjacent to the zero-value histogram bin, and take the sum of all the products as the feature to detect gamma transformation, namely,

$$ F = \sum\limits_{x \in {\Phi} } {h\left( {x - 1} \right) \cdot h\left( {x + 1} \right)} $$
(10)

where Φ = {x|h (x) = 0,0 < x < 255} is zero-value histogram bin locations set. In the construction of feature F, we consider both the number of zero-value histogram bins and the values of histogram bins adjacent to them. By this way, the value of F is small for the natural image and the stego, while it is large for the gamma transformed image. Therefore, using F to detect gamma transformation, we can avoid judging the stegos as gamma transformed images, which ensures the low missed detection rate of steganalysis when the forensics tool is used to reduce the false alarm rate.

4.3 Steganalysis aided by gamma transformation detection

Figure 4 show the flow chart of steganalysis aided by gamma transformation detection. Given an unknown image, the feature F is extracted firstly to detect gamma transformation. Based on the result, it is decided whether or not to further apply the steganalysis tool. In this paper, we use the steganalysis feature SRM [11] and the ensemble classifier [19] for steganalysis. The detailed steps are as follows.

  • #1   Calculate the image histogram h(x).

  • #2   Find the zero-value histogram bin locations Φ.

  • #3   Calculate the image feature F according to (10).

  • #4   Detect gamma transformation according to the predefined threshold η using the following rule

    $$ \delta = \left\{ \begin{aligned} &\text{image is gamma transformed}, &F>\eta\\ &\text{image is not gamma transformed}, &F\leq \eta \end{aligned} \right. $$
    (11)

    If gamma transformation is presented, then judge the image as the cover; otherwise, take the next step.

  • #5   Extract the steganalysis feature SRM.

  • #6   Feed the SRM to the trained ensemble classifier for steganalysis.

Fig. 4
figure 4

Flow chart of steganalysis aided by gamma transformation detection

5 Experimental results

We use 10,000 original images in BossBase-1.01Footnote 1 to validate the effectiveness of the proposed framework. The original images are with fixed size 512 × 512 coming from rescaled and cropped natural images of various sizes. For experiments, They are firstly gamma transformed with parameter γ = 1 + Δ, where Δ ∈ {± 0.1,± 0.2}. Meanwhile, the original image is considered as the gamma transformed with γ = 1, namely, Δ = 0. Then, information is embedded into each image by LSB matching and S-UNIWARD with payload ρ ∈ {0.1,0.2,0.3,0.4}. Hence, there are 450, 000 images in total in the constructed image dataset. At last, the proposed framework is applied for steganalysis.

At first, the performance of the feature F for gamma transformation detection is tested. Then the detected gamma transformed images are separated from the image dataset, and the remaining images are fed to the trained classifier for steganalysis. Images in each class are divided into two parts. One is for classifier training and threshold setting, and the other is for performance testing, including gamma transformation detection and steganalysis. It is worth mentioning that the missed detections of steganalysis should include the false positives of gamma transformation detection that the stegos are judged as gamma transformed. Meanwhile, the false alarm rate of steganalysis is the ratio of the number of misjudged covers to the number of all covers, instead of to the number of covers fed to the ensemble classifier.

5.1 Gamma transformation detection

We randomly selected 5,000 images to test the performance of gamma transformation detection. For contrast experiments, the methods proposed in [30] and [6] are used, which are called as STM and CGM in the rest of this paper. Figure 5 presents the ROC curves of gamma transformation detection results. Here, the gamma transformed image is considered as positive. The results show that the proposed method outperforms the other two methods. Besides, the proposed method achieves a high detection accuracy of gamma transformation at a very low false positive rate. For example, at the false positive rate of 0.02, the detection accuracies are 99.86%, 99.77%, 94.7%, and 94.97% when Δ = − 0.2, − 0.1, 0.1 and 0.2 respectively. This is significantly important to the followed steganalysis, because the low false positive rate of gamma transformation detection means few stegos are separated out in the gamma transformation detection.

Fig. 5
figure 5

ROC curves of gamma transformation detection results

In practice, gamma transformation detection according to the value of feature F needs a predefined threshold η. Figure 6 shows the distributions of the feature F of the original images, the gamma transformed images without information embedding and the stegos generated by LSB matching, where the payload ρ = 0 represents the images which have not undergone steganography. There are 5,000 images in each class, and the region where the values are larger than 6 × 10− 7 is compressed for a clear comparison between the gamma transformed images without information embedding and the other kinds of images. It is showed in Fig. 6 that the F values of original images and stegos are all smaller than 6 × 10− 7, while most of the gamma transformed images without information embedding are over than 6 × 10− 7. It indicates that by the thresholding method based on the F value, the gamma transformed images without information embedding could be separated from the original images and stegos. Considering the missed detection rate of steganalysis, the value of η should be large so that few stegos will be judged as the gamma transformed. While a too large threshold will lead to a high false alarm rate of steganalysis that many gamma transformed images will be missed detected and further undergo steganalysis. According to the experimental results, the threshold is set as η = 10− 6.

Fig. 6
figure 6

Plots of F values of heterogeneous images. The region where the values are larger than 6 × 10− 7 is compressed. The F values of original images and stegos are all smaller than 6 × 10− 7 while most of the gamma transformed images are over than 6 × 10− 7

Table 1 gives the error probability of gamma transformation detection for each kind of images using the feature F when η = 10− 6. The results show that images that have not undergone gamma transformation (Δ = 0) are all correctly classified, whether the steganography is applied or not. With respect to LSB matching, all stegos are judged as the not gamma transformed, while there are a few stegos generated by S-UNIWARD that are judged as the gamma transformed, which are going to be judged as covers directly without steganalysis. Therefore, judging the stegos as the gamma transformed may enlarge the missed detection rate of steganalysis. Besides, there are some false detections of gamma transformed images. As the zero-value histogram bins introduced by gamma transformation only distribute in one side of histogram [35, 36], there may be no zero-value histogram bins in the histograms except for that in the two histogram sides. Therefore, these images will be further steganalyzed.

Table 1 Error probability of gamma transformation detection for each kind of images

5.2 Steganalysis

After gamma transformation detection, the remaining images are going to be steganalyzed. In the proposed mode, the classifier trained on the original images and the corresponding stegos is used, and only the images which are not judged as not gamma transformed are tested. Steganalysis in the traditional mode is compared in two ways due to the types of the dataset for classifier training. Namely, the classifier is trained on the original images and the corresponding stegos, or on the heterogeneous images. The former and the latter trained ensemble classifiers are referred to as S-EC and M-EC respectively here. All investigated images in the testing sets are steganalyzed in the traditional mode. For the experiments of steganalysis in the traditional mode using M-EC, one fifth of each type of images are randomly selected to form the heterogeneous images set for classifier training and testing. We test 10 times for each image class and take the average of the 10 results for the performance evaluation.

5.2.1 Stegnalysis of LSB matching

Table 2 shows the false alarm rates (PFA), the missed detection rates (PMD), and the average error probabilities (PE) of steganalysis of LSB matching, where the rows indexed by Mixed present the overall steganalysis results of gamma transformed images with all parameters including ρ = 0 under each payload. In the traditional mode using S-EC, many gamma transformed images are judges as stegos, resulting in high false alarm rates. While in the proposed mode, most of the gamma transformed images without information embedding are separated in the gamma transformation detection, which reduces the probability of covers being judged as stegos. Therefore, the false alarm rates in the proposed mode are relatively low, which contributes to the small average detection error probabilities.

Table 2 Final results of steganalysis of LSB matching

As no stegos are judged as the gamma transformed in the gamma transformation detection, the final missed detection rates of steganalysis in the traditional and proposed modes are almost equal because they have the same trained classifier and the same testing images. The only difference in the results is caused by the randomicity of the selected feature subspaces. Besides, the missed detection rates when Δ = 0 are higher than that when Δ≠ 0, which indicates that the stegos generated by embedding information into gamma transformed images are easier to detect than that have not undergone gamma transformation. However, due to the high false alarm rates in the traditional mode when Δ≠ 0, the average error probabilities are larger than that when Δ = 0.

Using M-EC in the traditional mode could reduce the false alarms, especially for the steganalysis of the types of images which are gamma transformed, while the missed detection rate of the corresponding stegos is increased. In general, the performance of M-EC is better than S-EC in the traditional mode, but less than that in the proposed mode. In fact, for the application of steganalysis in the real world, it is almost impossible to train a universal classifier which is capable of detecting the stego in the heterogeneous images. Besides, the classifier trained on heterogeneous images may capture little information about the steganography as the main difference of a cover and a stego may be introduced by the image manipulations. Integrating a large number of different types of images into the training dataset will deteriorate the steganalysis performance of the trained classifier.

Overall, when the investigated image set includes gamma transformed images, the proposed framework can improve the reliability of steganalysis of LSB matching. On one hand, the false alarm rates are reduced by separating the gamma transformed images without information embedding from the investigated images. On the other hand, embedding information into the gamma transformed images by LSB matching will make it easier to detect the stegos. In this point of view, if the proposed steganalysis framework is applied and the results of gamma transformation detection are reliable, the security of steganography on gamma transformed images is lower than that on the natural images.

5.2.2 Stegnalysis of S-UNIWARD

Table 3 gives the false alarm rates (PFA), the missed detection rates (PMD), and the average error probabilities (PE) of steganalysis of S-UNIWARD. The results show that the proposed mode reduces the false alarms of steganalysis and decreases the average error probabilities. However, due to the false positives of gamma transformation detection that stegos are judged as gamma transformed images, the missed detection rates of steganalysis are higher than that in the traditional mode.

Table 3 Final results of steganalysis of S-UNIWARD

In addition, in the traditional mode there is an exceptional case that the false alarm rates when Δ = − 0.2 is relatively smaller than that when Δ = − 0.1, 0.1, and 0.2. Specially, when ρ = 0.3, the false alarm rate when Δ = − 0.2 is even smaller than that when Δ = 0. Moreover, the missed detection rates of steganalysis when Δ = − 0.2 are higher than others, which indicates that when Δ = − 0.2, the SRM features of the stegos generated by S-UNIWARD following the gamma transformation with parameter γ = 0.8 are more similar to the features of original images than that of other stegos. In this case, the proposed steganslysis framework is not capable of reducing the missed detections of steganalysis. While a possible solution to this problem is to judge that whether the investigated images have undergone gamma transformation and then use the classifier trained on the gamma transformed images to steganalyze the identified images, it is beyond the scope of this paper.

6 Conclusions and future works

This paper addresses the problem that the normally processed images may be judged as stegos by the steganalyzer, resulting in high false alarm rates of steganalysis in the real world. A steganalysis framework based on the combination of image forensics and steganalysis tools is proposed to improve the reliability of steganalysis in the real world. Firstly, the unknown image is investigated by fragile forensics of image manipulations. If no operations are presented on the image, then it is further detected by steganalyzers. Any fragile forensics tools for image manipulations detection could be applied to the proposed steganalysis framework to improve the steganalysis performance in the real world. The validation of the proposed steganalysis framework is verified by combining the steganalysis of LSB matching and S-UNIWARD with gamma transformation detection, where a new feature is constructed as the sum of products of two histogram bins adjacent to zero-value histogram bins. The false alarm rate is reduced by separating the gamma transformed images without information embedding from the investigated images.

However, the proposed steganalysis framework introduces more factors which affect the performance of steganalysis. The steganalysis reliability now depends on both the accuracies of image manipulation identification and stego detection. For example, with a low accuracy of the image operation forensics, many normally processed images will undergo steganalysis, which could result in a high false alarm rate of steganalysis. Meanwhile, if stegos are picked out by an image operation detector, they will be judged as covers directly without applying steganalyzers. Therefore, reliable image manipulation forensics tools are significantly important to improve the performance of steganalysis in the real world.

This paper focuses on reducing the false alarms of steganalysis in practice. While in reality, the information may be embedded into the processed images, and the stegos may trend to be judged as covers by steganalysis tools. In this case, the proposed steganalysis are not capable of reducing the missed detection rate. For the problem, further researches are needed to find the corresponding solutions.

Besides, our future research directions include but not be limited as follows. We try to extend our idea to process other type of data [24, 25, 37]. We also want to adopt multi-core CPU and many-core GPU parallel techniques [39, 45] to accelerate our algorithms in processing big image data [8, 40, 41, 43].