1 Introduction

In the recent decades, increasingly usage of the information technology and growth of data transmission have drawn the researchers’ attention to data hiding methods and their detection. Steganography is the science of hiding information in a cover media such as video, image or audio files. The statistical undetectability is the main requirement of steganography. To accomplish this, two class of techniques have been proposed: spatial-domain techniques such as least significant bit (LSB) [1] and LSB matching [2, 3], and transform domain techniques such as F5 [4], OutGuess [5], JpHide [6], StegHide [7], and YASS [8].

On the other hand, the aim of steganalysis is to detect the presence of embedded data in a given media. Steganalysis has been widely used in computer forensics, cyber warfare, and criminal activities detection over the internet [9, 10]. In this paper, a steganalysis method is presented for the colored joint photographic experts group (JPEG) images.

Steganalysis techniques have been classified into two major categories: blind steganalysis which is independent of the steganography method and the specific steganalysis which attempts to detect a specific steganographic media. The blind steganalysis discovers the presence of the hidden messages through extracting sensitive features, such as Markov transition probability matrix [11], statistical moments of characteristic function of sub-band histograms [12], and the merging feature set [13]. The specific steganalysis could reveal the secret messages or even estimate the embedding rate. For example, RS [14], WAM [15], and its modified version [16] can detect the spatial steganography reliably.

Generally, three stages of steganalysis are: feature extraction, feature selection, and classification. The features are mainly extracted directly from the spatial domain or from a transform domain such as discrete wavelet transform (DWT), discrete cosine transform (DCT), and contourlet transform (CT). The feature selection (FS) is the process of choosing a subset of the original feature spaces according to discrimination capability to improve the quality of data. Also, various types of classifiers such as artificial neural networks (ANNs) and support vector machines (SVMs) have been used in steganalysis systems [17].

In the feature extraction stage, two-dimensional DWT offers a decomposition of approximation coefficients of each level to four components: the approximation, and the details only in three orientations (horizontal, vertical, and diagonal). In this way, the features are extracted from the sub-band coefficients at each level of decomposition. On the other hand, in the contourlet transform more directional information can be captured by decomposing an image into directional sub-images at different scales, so this transform can capture heterogeneities and smooth contours more accurately than DWT.

In this study, we use the features that have been proposed in [18]. The extracted features are statistical moments (mean, variance, skewness, and kurtosis) of the contourlet coefficients. Also, the linear prediction of magnitude coefficients is performed based on [19] and the statistical moments of log error between the actual and linear predicted coefficients of contourlet transform are used as the features.

In addition, feature selection (FS) is the process of choosing a subset of the original feature space according to discrimination capability to improve the quality of data. Unlike feature transform techniques, the fewer dimensions obtained by feature selection facilitate exploratory of results in data analysis. Due to this predominance, FS has now been widely applied to many domains, such as pattern recognition and time-series forecasting [20]. In this paper, binary particle swarm optimization (BPSO) is used to reduce the size of feature set [21].

Two neural-based classifiers, radial basis function neural network (RBFNN) and probabilistic neural network (PNN), are used to measure the detection accuracy of proposed steganalysis system. In another experiment in this study, the steganalysis method that has been proposed in [18] is applied to a color image database, Uncompressed color image database (UCID), with embedding rates of 5, 10, and 25% of maximum capacity for steganography methods. Then, we apply the BPSO feature selection technique to the reported work in [18] to evaluate the effect of feature selection on the performance. Finally, as another aim of this study, the performance of three classifiers for different embedding ratios is compared in terms of true positive (TP) and false positive (FP) rates. These classifiers are SVM, which has been used in [18], and two neural-based ones; RBFNN and PNN. In this way, the confusion matrices are also reported when using the mentioned classifiers to solve the two-class (stego/clean image) problem at different embedding rates.

This paper is organized as follows. In Sect. 2, the related works are reviewed. The proposed scheme including feature extraction, feature selection, and classification stages are introduced in Sect. 3. The experimental results are reported in Sect. 4, and the conclusions are drawn in Sect. 5.

2 Related works

This section gives a review of some recent steganalysis schemes including different methods for feature extraction, feature selection, and pattern classification.

As sample researches in using different feature extraction methods in this field, Lie and Lin [22] have merged the features of spatial and DCT domains; i.e., gradient energy (as a spatial-domain feature), and the mean and variance of the Laplacian parameters (as the DCT-domain features). Pevny and Fridrich [13] have used a new merging feature set of DCT and calibrated Markov features and also SVM as a binary-classifier and also a multi-classifier for the images with different quality factors. Xuan et al. [23] have proposed a steganalysis method that was based on the statistical moments of characteristic functions of wavelet sub-bands and a Bayes classifier. Zhou and Hui [24] have used Markov transition matrix to capture the correlations between the adjacent coefficients in both intra-block and inter-block senses. Lyu and Farid [25] have used statistical-model features including the first- and higher-order magnitude statistics and the phase statistics from wavelet and local angular harmonic and also three classifiers; i.e., linear SVM, nonlinear SVM, and one-class SVM. Wang and Moulin [12] have used the empirical moments of probability density function (pdf) and the empirical characteristic function (CF) moments of the pdf in different wavelet sub-band coefficients and their sub-band prediction errors. Bhattacharyya distance was used for feature evaluation to improve the classification performance of Fisher linear discriminator. The contourlet domain has been investigated in steganalysis, too. Sajedi and Jamzad [18] have used the statistical moments of eight sub-bands in the third level of contourlet and the statistical moments of difference between actual and linear predicted coefficients of contourlet as the features. The JPEG steganography performs data hiding in DCT domains [26]. In this way, binary similarity measures have been used in JPEG steganalysis. For example, Lin and Zhong [27] have captured the 7th and 8th bit planes of the nonzero DCT coefficients from JPEG images and computed 14 features of each image based on binary similarity measures. Bhat et al. [28] have used a combined feature set including Huffman bit code length (HBCL) statistics and file-size to resolution ratio (FR index), called Huffman bit file index resolution (HUBFIRE).

As sample researches in using different feature selection methods in this field, the genetic algorithm (GA) has been also used to select a subset of candidate features, transformations, and coefficients of the logistic regression classifier model for blind image steganalysis [29]. Also, localized generalization error model (L-GEM) has been used for feature selection in steganalysis systems [30].

As sample researches in using different classification methods in this field, multi-classifier models based on the steganalysis results of decomposed image blocks [31] or a hierarchical ensemble of classifiers [32] have been proposed. Also, boosted fusion methods have been used to aggregate outputs of multiple steganalyzers [33]. Blind steganalysis based on one-class SVM (OC-SVM) with simulated annealing clustering algorithm has also been proposed which can create more reasonable multi-sphere by finding global optimum solutions in the clustering process [34].

As other sample reported steganalysis systems in the recent decade, we can mention the following systems: Wang et al. [35] with the aim of detecting the popular JPEG steganography algorithms have proposed a steganalysis scheme in which a kind of transition probability matrix is constructed to describe correlations of the quantized DCT coefficients in multi-directions. They have extracted a 96-dimensional feature vector and trained SVM to build the steganalyzer. Also, pixel-value differencing is a steganalysis technique in the spatial domain in which embedding is performed in the difference of the values of pixel pairs. Sabeti et al. [36] have identified a number of characteristics in the difference histogram that show meaningful changes when an image is embedded. They have also used five different multilayer perceptrons (MLPs) to detect different levels of embedding. Liu et al. [37] have used the features on the joint distributions of the DCT and DWT coefficients and the features on the polynomial fitting errors of the histogram of DCT coefficients, which they called original expanded features (EPF). To handle the large number of developed features, support vector machine-recursive feature elimination (SVM-RFE) method has been used for binary classification. Also, SVMs have been applied to the selected features for detecting stego images. Also, Liu et al. [38] have used the combination of a dynamic evolving neural fuzzy inference system (DENFIS) with mentioned SVM-RFE feature selection method for steganalysis of LSB matching steganography in grayscale images. Raval [39] has proposed a simple tactic for secure steganography in which a matrix based on the image content has been derived. This matrix is used by quantization index modulation (QIM)-based encoder and decoder. Wahab et al. [40] have proposed a steganalysis technique based on the conditional probability statistics. This technique performs better than Markov process-based technique in terms of classification accuracy on F5 software. The length of embedded message has been calculated using SVM to classify the cover and stego images, as well [41].

3 Proposed scheme

3.1 Feature extraction stage

We use the statistical features of coefficients and co-occurrence metrics of sub-band images as the features from contourlet domain. Contourlet is a multi-scale and multi-direction transformation which includes two major parts: Laplacian pyramid which produces multi-scale decompositions and the directional filter bank which produces multi-direction decomposition.

Laplacian pyramid is first used to capture the point discontinuities, and then followed by a directional filter bank to link point discontinuities into linear structures. Laplacian pyramid, at each level, generates a down-sampled low-pass version of the original and a difference between the original signal and the prediction, resulting in a bandpass image.

The second decomposition, directional filter bank, contains two serial building blocks. The first building block is a two-channel quincunx filter bank with fan filters that divides a 2-D spectrum into two directions: horizontal and vertical. The second building block of directional filter bank is a shearing operator, which amounts to reordering of image samples [42]. Contourlet filter bank is a combination of a Laplacian pyramid and a directional filter bank. Figure 1 shows the contourlet filter bank in one level. As shown in Fig. 1, the bandpass images from the Laplacian pyramid (multi-scale decomposition into octave bands) are passed through a directional filter bank, so that directional information can be captured. This scheme can be iterated on the coarse image. This combination is a double iterated filter bank structure, which decomposes images into directional sub-bands at multiple scales.

Fig. 1
figure 1

The contourlet filter bank [42]

In this study, discrete contourlet transformation is applied to images in 3 levels and 8 directions. For example, contourlet decomposition of an UCID image sample (UCID00015) with two scales, and four and eight directions is shown in Fig. 2. Four statistical moments (mean, variance, skewness, and kurtosis) are extracted from the third level coefficients of the contourlet transform and from its predicted log error [18, 19] as follows:

$$ M(x) = \frac{1}{n}\sum\limits_{k = 1}^{n} {x_{k} } $$
(1)
$$ {\text{Var(}}x )= \frac{1}{n - 1}\sum\limits_{k = 1}^{n} {\left( {x_{k} - M\left( x \right)} \right)^{2} } $$
(2)
$$ S(x) = E\left[ {\left( {\frac{x - M(x)}{{\sqrt {{\text{Var}}(x)} }}} \right)^{3} } \right] $$
(3)
$$ K(x) = E\left[ {\left( {\frac{x - M(x)}{{\sqrt {{\text{Var}}(x)} }}} \right)^{4} } \right] $$
(4)

where x k is a data point in the feature vector, and n is the dimension of feature vector.

Fig. 2
figure 2

Contourlet decomposition of an UCID image sample (UCID00015) with two scales, and four and eight directions

3.2 Feature selection stage

Irrelevant and redundant features degrade the performance of classification [43, 44]. A good feature selection method has several advantages for a learning algorithm such as reducing computational cost, increasing its classifier accuracy, and improving result comprehensibility [45]. So, most of the machine learning algorithms rely on feature selection techniques in order to perform effective classification in different applications such as pattern classification [4648], biomedical engineering [43, 4952], intrusion detection in computer networks [53, 54], bioinformatics [55], remote sensing [56], texture classification [57], audio classification [58], attribute selection in data mining [59], emotion recognition from speech signal [60, 61], text sentiment classification [62], software fault prediction [63], short-term electricity load forecasting [20], and bank failure prediction [64].

In this way, it has been shown that the accuracy, sensitivity, and specificity is improved, even with more than 50% of the input features eliminated [51]. Also, the number of input training parameters for neural networks should be kept small, because of maintaining the optimal generalization ability of the networks.

As mentioned earlier, binary particle swarm optimization (BPSO) is used as a closed-loop feature selection algorithm in this study. This algorithm selects the feature subset based on the classifier feedback. So, it is expected to get better performance, even with a reduced number of features.

PSO is a population-based algorithm to find solutions of an optimization problem. The search space is constructed based on the variables of problem. The flowchart of PSO algorithm is illustrated in Fig. 3. In BPSO, since the variables indicate existence or nonexistence of a feature, the search space is a binary space. Positions of the particles are updated based on updating their velocity according to the following equations:

$$ v_{ij} \left( {t + 1} \right) = wv_{ij} \left( t \right) + C_{1} R_{1} \left( {P_{{{\text{best}}_{ij} }} - x_{ij} \left( t \right)} \right) + C_{2} R_{2} \left( {G_{{{\text{best}}_{j} }} - x_{ij} \left( t \right)} \right) $$
(5)
$$ x_{ij} \left( {t + 1} \right) = x_{ij} \left( t \right) + v_{ij} \left( {t + 1} \right) $$
(6)

where v ij (t) indicates the velocity of jth component of ith particle at the position x ij (t) in tth iteration. w is the velocity coefficient, R 1 and R 2 are two random numbers. \( P_{{{\text{best}}_{ij} }} \) is the jth component of ith particle which minimizes the cost function as compared to the previous iterations. The \( G_{{{\text{best}}_{j} }} \) indicates jth component of the best particle in the minimization of cost function as compared to previous iterations. Hence, C 1 and C 2 are the weights of local and global terms of search algorithm. BPSO has been introduced in 1997 [65]. Like genetic algorithm (GA), BPSO could be effectively utilized in binary optimization problems. In the BPSO technique, the probability of the particle being as 0 or 1 is specified by the velocity value using sigmoid function. The conversion of continuous PSO to BPSO is performed as follows [21]:

$$ x_{ij} (t + 1) = \left\{ {\begin{array}{*{20}c} 0 \hfill & {{\text{if}}\;{\text{rand}}() \ge S(v_{ij} (t + 1))} \hfill \\ 1 \hfill & {{\text{if}}\;{\text{rand}}() < S(v_{ij} (t + 1))} \hfill \\ \end{array} } \right. $$
(7)

where rand() is the random numbers uniformly distributed between 0 and 1. S(·) is the sigmoid function and is given as follows:

$$ S\left( {v_{ij} \left( {t + 1} \right)} \right) = \frac{1}{{1 + e^{{ - v_{ij} (t + 1)}} }} $$
(8)
Fig. 3
figure 3

Flowchart of the PSO algorithm

In this way, the continuous positions are converted to binary positions. Binary position “1” is assigned to effective features and binary position “0” to noneffective features in distinguishing stego from clean images.

In our simulations, the number of particles is set to 10, and the values of these particles are taken as 0 at the beginning of the process. In the optimization process, training and test sets are composed considering the features defined by the particles, and the classifiers (RBFNN, PNN, and SVM) are trained and tested using datasets. As a result, classification accuracy rate and training error rate are obtained for each particle. The success rate of each particle is calculated using the following fitness function:

$$ f(i) = A(i) - E(i) $$
(9)

where f(i) is the success rate, A(i) is the classification accuracy rate and E(i) is the training error rate of ith particle. Velocity of the particles is calculated using (5) and v min and v max are set to −6 and 6, respectively. For each iteration, \( P_{{{\text{best}}_{ij} }} \) and \( G_{{{\text{best}}_{j} }} \) is updated if necessary. At the end of the optimization process, \( G_{{{\text{best}}_{j} }} \) is found as the optimum solution. In our simulations, BPSO algorithm is iterated 100 times to find the optimum subset of features. In (5), w is set to 0.2, and it is assumed that C 1 = C 2 = 2 [66].

3.3 Classification stage

3.3.1 Support vector machine

In this study, SVM which is a supervised learning method [67] is used as one of the classification techniques. Given a set of instance label pairs \( \left( {x_{i} ,y_{i} } \right),\,x_{i} \in R^{n} ,y_{i} \in \left\{ { - 1,1} \right\},\,i = 1, \ldots ,m, \) SVM maps the training vectors x i into a higher dimensional space. In other words, SVM constructs a separating line which maximizes the margin in this higher dimensional feature space. Also, the hyperplane is formed according to the selected kernel. Using a nonlinear kernel function allows the algorithm to fit the maximum margin hyperplane in a transformed feature space. If the nonlinear kernel used is a Gaussian radial basis function (RBF), the corresponding feature space is a Hilbert space of infinite dimension [68]. Maximum margin classifiers are well regularized, so Gaussian RBFs have received significant attention.

Classical techniques utilizing radial basis functions employ some methods of determining a subset of centers. Typically a method of clustering is first employed to select a subset of centers. An attractive feature of the SVM is that these selections are implicit, with each support vector contributing one local Gaussian function, centered at that data point. We have used SVM with Gaussian RBFs kernel [69]:

$$ k\left( {x_{i} ,x_{j} } \right) = \exp \left( {\frac{{||x_{i} - x_{j} ||^{2} }}{{2\sigma^{2} }}} \right) $$
(10)

where x j and x j are the data points, and σ is the spread of Gaussian function. In our simulations, we set 2σ 2 to 1.

3.3.2 Radial basis function neural network

Radial basis function neural network (RBFNN) is a feedforward neural network. RBFNN consists of an input layer, a radial basis function layer (hidden layer), and an output layer [70]. The number of input layer nodes is equal to the number of selected features, and output layer has one neuron. The single neuron of output layer indicates the data classes (clean or stego). The weighted input to the radial basis transfer functions of the hidden layer is the Euclidian distance between the input vector and the weight matrix of links connecting the input nodes to hidden layer nodes, multiplied by a bias value (which affects the spread of radial basis function, inversely). Decreasing the distance between the weight vector and the input vector, results in increasing the output. The weights and biases are obtained by training the neural network to minimize the misclassification error in a mean square error (MSE) sense [71].

3.3.3 Probabilistic neural network

PNN is a neural network that is suited for classification applications. It consists of an input layer, a hidden layer called pattern layer, another hidden layer called summation layer, and an output layer [72]. The input layer units are distribution units that supply the same input values to all of the pattern units. It is noted that the number of neurons in pattern layer is equal to the number of training data. The number of neurons in summation layer is equal to the number of classes, as well. Membership of a test data (x) to each class is computed as the summation of closeness of x to all members of that class (which are the training samples that belong to that class). Usually, the membership function of the pattern layer neurons is considered as normal distribution function. The structure of PNN is shown in Fig. 4.

Fig. 4
figure 4

Structure of PNN [72]

4 Simulation and experimental results

In our simulations, we have used uncompressed color image database (UCID), which contains 1338 color TIFF images [73]. The images were converted from TIFF to JPEG images with a quality factor of 80, in order to be used in steganographic methods.

In order to obtain stego dataset, Jsteg [74], OutGuess [75], model-based algorithm [76] and JPHS software [77] have been applied to the JPEG images to embed the randomly secret message with 5, 10, and 25% of maximum embedding capacity of steganography methods. For each embedding rate, we have four steganography methods. So, there are 5352 stego images. 1338 stego images have been randomly selected from stego image dataset. Also, 1338 clean JPEG images of UCID database have been used to organize an equal-size two-class dataset.

Ninety-six features have been extracted from the third level of contourlet transform in eight directions and three color channels. Also, the linear prediction of magnitude coefficients is performed and the statistical moments of log error between actual and linear predicted coefficients of contourlet transform have been determined. In this way, 96 supplementary features have been achieved.

BPSO, as a closed-loop feature selection method, is used to select the most efficient features in tandem with improvement of the detection rate. In this work, the feature set size is reduced from 192 to a reduced feature set with the size of 106–125 features as reported in Tables 1, 2 and 3, when the confusion matrices for three embedding rates (25, 10, and 5%) are reported. It is noted that the information hiding ratio or embedding rate is a well-known metric for evaluating steganalysis performance [78]. As can be seen, the number of selected features by BPSO algorithm when using RBFNN is 108, 110, and 106 features for the three mentioned embedding rates, respectively. Also, the number of selected features when using PNN classifier is 120, 118, and 112 features, respectively. It is noted that the significance of features and the detection performance depend not only on the embedding rate, but also on the image complexity [78].

Table 1 Confusion matrix using RBFNN and PNN, embedding rate = 25%
Table 2 Confusion matrix using RBFNN and PNN, embedding rate = 10%
Table 3 Confusion matrix using RBFNN and PNN, embedding rate = 5%

In this study, random subsampling validation has been used as the cross-validation method. The average ratio of the training and test dataset size used in similar works, for example reported in [18, 38, 79], is used in our simulations as a criterion to make the results comparable with others. In this way, 86% of data is selected randomly for training and the remained 14% is used for test. This procedure is repeated 100 times. Due to using random subsampling validation method, the test images are also selected randomly in each of the mentioned iterations. The overall performance of proposed steganalysis method, when applying all of the test datasets of different steganography algorithms, is evaluated by employing RBFNN, PNN, and SVM classifiers to solve the two-class (stego/clean image) problem (Tables 1, 2, 3 for RBFNN and PNN, and Tables 7, 8 and 9 for SVM).

The receiver operating characteristic (ROC) curves for test data, when using RBFNN and PNN, are depicted in Figs. 5 and 6, respectively. The ROC curve shows the percentage of correctly detected stego images (true positive) versus the percentage of cover images detected as stego images (false positive). In this way, the detection accuracy is calculated using (11) with the subsequent descriptions:

$$ {\text{Accuracy}} = \left( {{\text{TP}} + {\text{TN}}} \right)/\left( {{\text{P}} + {\text{N}}} \right) $$
(11)
True positive (TP):

Ratio of stego images that correctly classified as stego,

False positive (FP):

Ratio of clean images that incorrectly classified as stego,

True negative (TN):

Ratio of clean images that correctly classified as clean,

False negative (FN):

Ratio of stego images that incorrectly classified as clean

Fig. 5
figure 5

ROC curves for different embedding rates, BPSO + RBFNN

Fig. 6
figure 6

ROC curves for different embedding rates, BPSO + PNN

In this work, the simulations are run on a PC powered by an Intel® core2 Dou, 2.2 GHz CPU, and 2 GB of RAM. In another experiment, we apply the BPSO feature selection technique to the steganalysis method that has been proposed in [18] to evaluate the effect of feature selection on the performance. For SVM simulation, LIBSVM package [67] is used in our simulations. We have used ν-soft margin support vector classifier (ν-SVC) in which a parameter, ν, controls the number of support vectors and training errors. The value of ν in the range of (0, 1] is an upper bound on the fraction of training errors and a lower bound on the fraction of support vectors. We set ν to 0.2. The spread constant for the radial basis layer is set to 2 for RBFNN and 0.2 for PNN, respectively.

As shown in Tables 1, 2, 3, RBFNN classifier results in higher detection accuracy, however the training time in PNN is shorter, as compared to RBFNN. The training time with 192 features when using PNN and RBFNN is 0.2485 and 36.8456 s, respectively. The training time, when using SVM is 2.6203 s. The true positive and false positive rates for each of the steganography methods, when applying the test dataset and using RBFNN, PNN, and SVM classifiers, are reported in Tables 4, 5 and 6, respectively.

Table 4 TP and FP rates for different embedding rates-RBFNN classifier
Table 5 TP and FP rates for different embedding rates-PNN classifier
Table 6 TP and FP rates for different embedding rates-SVM classifier

The confusion matrices of the proposed steganalysis method in [18], which uses similar feature vectors, and the effect of using BPSO feature selection algorithm for three embedding rates are reported in Tables 7, 8, 9, respectively. The ROC curve, when using BPSO + SVM classifier, is depicted for different embedding rates in Fig. 7, too. As shown in Tables 7, 8, 9, in spite of about 30% reduction in the size of feature set, the detection accuracy is improved in different embedding rate conditions (Fig. 8).

Table 7 Confusion matrix when using SVM [18] and BPSO + SVM (proposed), embedding rate = 25%
Table 8 Confusion matrix when using SVM [18] and BPSO + SVM (proposed), embedding rate = 10%
Table 9 Confusion matrix when using SVM [18] and BPSO + SVM (proposed), embedding rate = 5%
Fig. 7
figure 7

ROC curves for different embedding rates, BPSO + SVM

Fig. 8
figure 8

Performance of RBFNN and PNN for different embedding rates

5 Conclusion

Feature extraction, feature selection, and classification are the three stages of steganalysis systems. In this paper, a steganalysis method has been proposed which is based on the features from the contourlet transform domain. Contourlet has the ability of capturing smooth contours in images. As compared to some common transforms, such as wavelet, directionality and anisotropy are the important advantages of contourlet. In this work, the candidate features consist of statistical moments of contourlet coefficients and statistical moments of log error between actual and linear predicted coefficients of contourlet transform in the third level and eight directions. The number of features has been reduced by BPSO feature selection technique. So, the computational load has been reduced and the classification accuracy of stego/clean images has been improved, as well. Radial basis neural networks, RBFNN and PNN, and also SVM have been used as classification tools in this work to distinguish stego from clean images.

Experimental results have shown that in spite of low embedding rate, the detection accuracy of the proposed method is more than 80%, when using SVM or RBFNN which is equipped by BPSO.

The investigations have shown that using BPSO feature selection algorithm in this classification problem has led to feature-set size reduction by 30% and higher detection accuracies have been achieved, as well. In addition, it is concluded that the performance of ν-soft margin support vector classifier (ν-SVC) is better than RBFNN, because its training time is shorter than RBFNN. It is noted that the training time of SVM and RBFNN was 2.6203 and 36.8456 s, respectively. However, by comparing the results reported in Tables 4 and 6, we conclude that the classification rate of RBFNN is slightly better than SVM in high embedding rates for Jsteg, OutGuess, and JPHS steganography algorithms. In this way, PNN has the shortest training time in this application; however its classification rate is not competitive. In our future work, we will investigate the effect of using supplementary features using contourlet coefficients.