Improved contourlet-based steganalysis using binary particle swarm optimization and radial basis neural networks

Sheikhan, Mansour; Pezhmanpour, Mansoureh; Moin, M. Shahram

doi:10.1007/s00521-011-0729-9

Improved contourlet-based steganalysis using binary particle swarm optimization and radial basis neural networks

Original Article
Published: 19 August 2011

Volume 21, pages 1717–1728, (2012)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Improved contourlet-based steganalysis using binary particle swarm optimization and radial basis neural networks

Download PDF

Mansour Sheikhan¹,
Mansoureh Pezhmanpour¹ &
M. Shahram Moin²

626 Accesses
35 Citations
Explore all metrics

Abstract

Steganography is the science of hiding information in a media such as video, image or audio files. On the other hand, the aim of steganalysis is to detect the presence of embedded data in a given media. In this paper, a steganalysis method is presented for the colored joint photographic experts group images in which the statistical moments of contourlet transform coefficients are used as the features. In this way, binary particle swarm optimization algorithm is also employed as a closed-loop feature selection method to select the efficient features in tandem with improvement of the detection rate. Nonlinear support vector machine and two variants of radial basis neural networks, i.e., radial basis function and probabilistic neural network, are used as the classification tools and their performance is compared in detecting the stego and clean images. Experimental results show that even for low embedding rates, the detection accuracy of the proposed method is more than 80% along with 30% reduction in the size of feature set.

Optimized support vector neural network and contourlet transform for image steganography

Article 23 March 2020

Integrating machine learning and features extraction for practical reliable color images steganalysis classification

Article 08 August 2023

Is blind image steganalysis practical using feature-based classification?

Article 26 May 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In the recent decades, increasingly usage of the information technology and growth of data transmission have drawn the researchers’ attention to data hiding methods and their detection. Steganography is the science of hiding information in a cover media such as video, image or audio files. The statistical undetectability is the main requirement of steganography. To accomplish this, two class of techniques have been proposed: spatial-domain techniques such as least significant bit (LSB) [1] and LSB matching [2, 3], and transform domain techniques such as F5 [4], OutGuess [5], JpHide [6], StegHide [7], and YASS [8].

On the other hand, the aim of steganalysis is to detect the presence of embedded data in a given media. Steganalysis has been widely used in computer forensics, cyber warfare, and criminal activities detection over the internet [9, 10]. In this paper, a steganalysis method is presented for the colored joint photographic experts group (JPEG) images.

Steganalysis techniques have been classified into two major categories: blind steganalysis which is independent of the steganography method and the specific steganalysis which attempts to detect a specific steganographic media. The blind steganalysis discovers the presence of the hidden messages through extracting sensitive features, such as Markov transition probability matrix [11], statistical moments of characteristic function of sub-band histograms [12], and the merging feature set [13]. The specific steganalysis could reveal the secret messages or even estimate the embedding rate. For example, RS [14], WAM [15], and its modified version [16] can detect the spatial steganography reliably.

Generally, three stages of steganalysis are: feature extraction, feature selection, and classification. The features are mainly extracted directly from the spatial domain or from a transform domain such as discrete wavelet transform (DWT), discrete cosine transform (DCT), and contourlet transform (CT). The feature selection (FS) is the process of choosing a subset of the original feature spaces according to discrimination capability to improve the quality of data. Also, various types of classifiers such as artificial neural networks (ANNs) and support vector machines (SVMs) have been used in steganalysis systems [17].

In the feature extraction stage, two-dimensional DWT offers a decomposition of approximation coefficients of each level to four components: the approximation, and the details only in three orientations (horizontal, vertical, and diagonal). In this way, the features are extracted from the sub-band coefficients at each level of decomposition. On the other hand, in the contourlet transform more directional information can be captured by decomposing an image into directional sub-images at different scales, so this transform can capture heterogeneities and smooth contours more accurately than DWT.

In this study, we use the features that have been proposed in [18]. The extracted features are statistical moments (mean, variance, skewness, and kurtosis) of the contourlet coefficients. Also, the linear prediction of magnitude coefficients is performed based on [19] and the statistical moments of log error between the actual and linear predicted coefficients of contourlet transform are used as the features.

In addition, feature selection (FS) is the process of choosing a subset of the original feature space according to discrimination capability to improve the quality of data. Unlike feature transform techniques, the fewer dimensions obtained by feature selection facilitate exploratory of results in data analysis. Due to this predominance, FS has now been widely applied to many domains, such as pattern recognition and time-series forecasting [20]. In this paper, binary particle swarm optimization (BPSO) is used to reduce the size of feature set [21].

Two neural-based classifiers, radial basis function neural network (RBFNN) and probabilistic neural network (PNN), are used to measure the detection accuracy of proposed steganalysis system. In another experiment in this study, the steganalysis method that has been proposed in [18] is applied to a color image database, Uncompressed color image database (UCID), with embedding rates of 5, 10, and 25% of maximum capacity for steganography methods. Then, we apply the BPSO feature selection technique to the reported work in [18] to evaluate the effect of feature selection on the performance. Finally, as another aim of this study, the performance of three classifiers for different embedding ratios is compared in terms of true positive (TP) and false positive (FP) rates. These classifiers are SVM, which has been used in [18], and two neural-based ones; RBFNN and PNN. In this way, the confusion matrices are also reported when using the mentioned classifiers to solve the two-class (stego/clean image) problem at different embedding rates.

This paper is organized as follows. In Sect. 2, the related works are reviewed. The proposed scheme including feature extraction, feature selection, and classification stages are introduced in Sect. 3. The experimental results are reported in Sect. 4, and the conclusions are drawn in Sect. 5.

2 Related works

This section gives a review of some recent steganalysis schemes including different methods for feature extraction, feature selection, and pattern classification.

As sample researches in using different feature extraction methods in this field, Lie and Lin [22] have merged the features of spatial and DCT domains; i.e., gradient energy (as a spatial-domain feature), and the mean and variance of the Laplacian parameters (as the DCT-domain features). Pevny and Fridrich [13] have used a new merging feature set of DCT and calibrated Markov features and also SVM as a binary-classifier and also a multi-classifier for the images with different quality factors. Xuan et al. [23] have proposed a steganalysis method that was based on the statistical moments of characteristic functions of wavelet sub-bands and a Bayes classifier. Zhou and Hui [24] have used Markov transition matrix to capture the correlations between the adjacent coefficients in both intra-block and inter-block senses. Lyu and Farid [25] have used statistical-model features including the first- and higher-order magnitude statistics and the phase statistics from wavelet and local angular harmonic and also three classifiers; i.e., linear SVM, nonlinear SVM, and one-class SVM. Wang and Moulin [12] have used the empirical moments of probability density function (pdf) and the empirical characteristic function (CF) moments of the pdf in different wavelet sub-band coefficients and their sub-band prediction errors. Bhattacharyya distance was used for feature evaluation to improve the classification performance of Fisher linear discriminator. The contourlet domain has been investigated in steganalysis, too. Sajedi and Jamzad [18] have used the statistical moments of eight sub-bands in the third level of contourlet and the statistical moments of difference between actual and linear predicted coefficients of contourlet as the features. The JPEG steganography performs data hiding in DCT domains [26]. In this way, binary similarity measures have been used in JPEG steganalysis. For example, Lin and Zhong [27] have captured the 7th and 8th bit planes of the nonzero DCT coefficients from JPEG images and computed 14 features of each image based on binary similarity measures. Bhat et al. [28] have used a combined feature set including Huffman bit code length (HBCL) statistics and file-size to resolution ratio (FR index), called Huffman bit file index resolution (HUBFIRE).

As sample researches in using different feature selection methods in this field, the genetic algorithm (GA) has been also used to select a subset of candidate features, transformations, and coefficients of the logistic regression classifier model for blind image steganalysis [29]. Also, localized generalization error model (L-GEM) has been used for feature selection in steganalysis systems [30].

As sample researches in using different classification methods in this field, multi-classifier models based on the steganalysis results of decomposed image blocks [31] or a hierarchical ensemble of classifiers [32] have been proposed. Also, boosted fusion methods have been used to aggregate outputs of multiple steganalyzers [33]. Blind steganalysis based on one-class SVM (OC-SVM) with simulated annealing clustering algorithm has also been proposed which can create more reasonable multi-sphere by finding global optimum solutions in the clustering process [34].

As other sample reported steganalysis systems in the recent decade, we can mention the following systems: Wang et al. [35] with the aim of detecting the popular JPEG steganography algorithms have proposed a steganalysis scheme in which a kind of transition probability matrix is constructed to describe correlations of the quantized DCT coefficients in multi-directions. They have extracted a 96-dimensional feature vector and trained SVM to build the steganalyzer. Also, pixel-value differencing is a steganalysis technique in the spatial domain in which embedding is performed in the difference of the values of pixel pairs. Sabeti et al. [36] have identified a number of characteristics in the difference histogram that show meaningful changes when an image is embedded. They have also used five different multilayer perceptrons (MLPs) to detect different levels of embedding. Liu et al. [37] have used the features on the joint distributions of the DCT and DWT coefficients and the features on the polynomial fitting errors of the histogram of DCT coefficients, which they called original expanded features (EPF). To handle the large number of developed features, support vector machine-recursive feature elimination (SVM-RFE) method has been used for binary classification. Also, SVMs have been applied to the selected features for detecting stego images. Also, Liu et al. [38] have used the combination of a dynamic evolving neural fuzzy inference system (DENFIS) with mentioned SVM-RFE feature selection method for steganalysis of LSB matching steganography in grayscale images. Raval [39] has proposed a simple tactic for secure steganography in which a matrix based on the image content has been derived. This matrix is used by quantization index modulation (QIM)-based encoder and decoder. Wahab et al. [40] have proposed a steganalysis technique based on the conditional probability statistics. This technique performs better than Markov process-based technique in terms of classification accuracy on F5 software. The length of embedded message has been calculated using SVM to classify the cover and stego images, as well [41].

3 Proposed scheme

3.1 Feature extraction stage

We use the statistical features of coefficients and co-occurrence metrics of sub-band images as the features from contourlet domain. Contourlet is a multi-scale and multi-direction transformation which includes two major parts: Laplacian pyramid which produces multi-scale decompositions and the directional filter bank which produces multi-direction decomposition.

Laplacian pyramid is first used to capture the point discontinuities, and then followed by a directional filter bank to link point discontinuities into linear structures. Laplacian pyramid, at each level, generates a down-sampled low-pass version of the original and a difference between the original signal and the prediction, resulting in a bandpass image.

The second decomposition, directional filter bank, contains two serial building blocks. The first building block is a two-channel quincunx filter bank with fan filters that divides a 2-D spectrum into two directions: horizontal and vertical. The second building block of directional filter bank is a shearing operator, which amounts to reordering of image samples [42]. Contourlet filter bank is a combination of a Laplacian pyramid and a directional filter bank. Figure 1 shows the contourlet filter bank in one level. As shown in Fig. 1, the bandpass images from the Laplacian pyramid (multi-scale decomposition into octave bands) are passed through a directional filter bank, so that directional information can be captured. This scheme can be iterated on the coarse image. This combination is a double iterated filter bank structure, which decomposes images into directional sub-bands at multiple scales.

In this study, discrete contourlet transformation is applied to images in 3 levels and 8 directions. For example, contourlet decomposition of an UCID image sample (UCID00015) with two scales, and four and eight directions is shown in Fig. 2. Four statistical moments (mean, variance, skewness, and kurtosis) are extracted from the third level coefficients of the contourlet transform and from its predicted log error [18, 19] as follows:

$$ M(x) = \frac{1}{n}\sum\limits_{k = 1}^{n} {x_{k} } $$

(1)

$$ {\text{Var(}}x )= \frac{1}{n - 1}\sum\limits_{k = 1}^{n} {\left( {x_{k} - M\left( x \right)} \right)^{2} } $$

(2)

$$ S(x) = E\left[ {\left( {\frac{x - M(x)}{{\sqrt {{\text{Var}}(x)} }}} \right)^{3} } \right] $$

(3)

$$ K(x) = E\left[ {\left( {\frac{x - M(x)}{{\sqrt {{\text{Var}}(x)} }}} \right)^{4} } \right] $$

(4)

where x _k is a data point in the feature vector, and n is the dimension of feature vector.

3.2 Feature selection stage

Irrelevant and redundant features degrade the performance of classification [43, 44]. A good feature selection method has several advantages for a learning algorithm such as reducing computational cost, increasing its classifier accuracy, and improving result comprehensibility [45]. So, most of the machine learning algorithms rely on feature selection techniques in order to perform effective classification in different applications such as pattern classification [46–48], biomedical engineering [43, 49–52], intrusion detection in computer networks [53, 54], bioinformatics [55], remote sensing [56], texture classification [57], audio classification [58], attribute selection in data mining [59], emotion recognition from speech signal [60, 61], text sentiment classification [62], software fault prediction [63], short-term electricity load forecasting [20], and bank failure prediction [64].

In this way, it has been shown that the accuracy, sensitivity, and specificity is improved, even with more than 50% of the input features eliminated [51]. Also, the number of input training parameters for neural networks should be kept small, because of maintaining the optimal generalization ability of the networks.

As mentioned earlier, binary particle swarm optimization (BPSO) is used as a closed-loop feature selection algorithm in this study. This algorithm selects the feature subset based on the classifier feedback. So, it is expected to get better performance, even with a reduced number of features.

PSO is a population-based algorithm to find solutions of an optimization problem. The search space is constructed based on the variables of problem. The flowchart of PSO algorithm is illustrated in Fig. 3. In BPSO, since the variables indicate existence or nonexistence of a feature, the search space is a binary space. Positions of the particles are updated based on updating their velocity according to the following equations:

$$ v_{ij} \left( {t + 1} \right) = wv_{ij} \left( t \right) + C_{1} R_{1} \left( {P_{{{\text{best}}_{ij} }} - x_{ij} \left( t \right)} \right) + C_{2} R_{2} \left( {G_{{{\text{best}}_{j} }} - x_{ij} \left( t \right)} \right) $$

(5)

$$ x_{ij} \left( {t + 1} \right) = x_{ij} \left( t \right) + v_{ij} \left( {t + 1} \right) $$

(6)

where v _ij(t) indicates the velocity of jth component of ith particle at the position x _ij(t) in tth iteration. w is the velocity coefficient, R ₁ and R ₂ are two random numbers. $ P_{{{\text{best}}_{ij} }} $ is the jth component of ith particle which minimizes the cost function as compared to the previous iterations. The $ G_{{{\text{best}}_{j} }} $ indicates jth component of the best particle in the minimization of cost function as compared to previous iterations. Hence, C ₁ and C ₂ are the weights of local and global terms of search algorithm. BPSO has been introduced in 1997 [65]. Like genetic algorithm (GA), BPSO could be effectively utilized in binary optimization problems. In the BPSO technique, the probability of the particle being as 0 or 1 is specified by the velocity value using sigmoid function. The conversion of continuous PSO to BPSO is performed as follows [21]:

$$ x_{ij} (t + 1) = \left\{ {\begin{array}{*{20}c} 0 \hfill & {{\text{if}}\;{\text{rand}}() \ge S(v_{ij} (t + 1))} \hfill \\ 1 \hfill & {{\text{if}}\;{\text{rand}}() < S(v_{ij} (t + 1))} \hfill \\ \end{array} } \right. $$

(7)

where rand() is the random numbers uniformly distributed between 0 and 1. S(·) is the sigmoid function and is given as follows:

$$ S\left( {v_{ij} \left( {t + 1} \right)} \right) = \frac{1}{{1 + e^{{ - v_{ij} (t + 1)}} }} $$

(8)

In this way, the continuous positions are converted to binary positions. Binary position “1” is assigned to effective features and binary position “0” to noneffective features in distinguishing stego from clean images.

In our simulations, the number of particles is set to 10, and the values of these particles are taken as 0 at the beginning of the process. In the optimization process, training and test sets are composed considering the features defined by the particles, and the classifiers (RBFNN, PNN, and SVM) are trained and tested using datasets. As a result, classification accuracy rate and training error rate are obtained for each particle. The success rate of each particle is calculated using the following fitness function:

$$ f(i) = A(i) - E(i) $$

(9)

where f(i) is the success rate, A(i) is the classification accuracy rate and E(i) is the training error rate of ith particle. Velocity of the particles is calculated using (5) and v _min and v _max are set to −6 and 6, respectively. For each iteration, $ P_{{{\text{best}}_{ij} }} $ and $ G_{{{\text{best}}_{j} }} $ is updated if necessary. At the end of the optimization process, $ G_{{{\text{best}}_{j} }} $ is found as the optimum solution. In our simulations, BPSO algorithm is iterated 100 times to find the optimum subset of features. In (5), w is set to 0.2, and it is assumed that C ₁ = C ₂ = 2 [66].

3.3 Classification stage

3.3.1 Support vector machine

In this study, SVM which is a supervised learning method [67] is used as one of the classification techniques. Given a set of instance label pairs $ \left( {x_{i} ,y_{i} } \right),\,x_{i} \in R^{n} ,y_{i} \in \left\{ { - 1,1} \right\},\,i = 1, \ldots ,m, $ SVM maps the training vectors x _i into a higher dimensional space. In other words, SVM constructs a separating line which maximizes the margin in this higher dimensional feature space. Also, the hyperplane is formed according to the selected kernel. Using a nonlinear kernel function allows the algorithm to fit the maximum margin hyperplane in a transformed feature space. If the nonlinear kernel used is a Gaussian radial basis function (RBF), the corresponding feature space is a Hilbert space of infinite dimension [68]. Maximum margin classifiers are well regularized, so Gaussian RBFs have received significant attention.

Classical techniques utilizing radial basis functions employ some methods of determining a subset of centers. Typically a method of clustering is first employed to select a subset of centers. An attractive feature of the SVM is that these selections are implicit, with each support vector contributing one local Gaussian function, centered at that data point. We have used SVM with Gaussian RBFs kernel [69]:

$$ k\left( {x_{i} ,x_{j} } \right) = \exp \left( {\frac{{||x_{i} - x_{j} ||^{2} }}{{2\sigma^{2} }}} \right) $$

(10)

where x _j and x _j are the data points, and σ is the spread of Gaussian function. In our simulations, we set 2σ ² to 1.

3.3.2 Radial basis function neural network

Radial basis function neural network (RBFNN) is a feedforward neural network. RBFNN consists of an input layer, a radial basis function layer (hidden layer), and an output layer [70]. The number of input layer nodes is equal to the number of selected features, and output layer has one neuron. The single neuron of output layer indicates the data classes (clean or stego). The weighted input to the radial basis transfer functions of the hidden layer is the Euclidian distance between the input vector and the weight matrix of links connecting the input nodes to hidden layer nodes, multiplied by a bias value (which affects the spread of radial basis function, inversely). Decreasing the distance between the weight vector and the input vector, results in increasing the output. The weights and biases are obtained by training the neural network to minimize the misclassification error in a mean square error (MSE) sense [71].

3.3.3 Probabilistic neural network

PNN is a neural network that is suited for classification applications. It consists of an input layer, a hidden layer called pattern layer, another hidden layer called summation layer, and an output layer [72]. The input layer units are distribution units that supply the same input values to all of the pattern units. It is noted that the number of neurons in pattern layer is equal to the number of training data. The number of neurons in summation layer is equal to the number of classes, as well. Membership of a test data (x) to each class is computed as the summation of closeness of x to all members of that class (which are the training samples that belong to that class). Usually, the membership function of the pattern layer neurons is considered as normal distribution function. The structure of PNN is shown in Fig. 4.

4 Simulation and experimental results

In our simulations, we have used uncompressed color image database (UCID), which contains 1338 color TIFF images [73]. The images were converted from TIFF to JPEG images with a quality factor of 80, in order to be used in steganographic methods.

In order to obtain stego dataset, Jsteg [74], OutGuess [75], model-based algorithm [76] and JPHS software [77] have been applied to the JPEG images to embed the randomly secret message with 5, 10, and 25% of maximum embedding capacity of steganography methods. For each embedding rate, we have four steganography methods. So, there are 5352 stego images. 1338 stego images have been randomly selected from stego image dataset. Also, 1338 clean JPEG images of UCID database have been used to organize an equal-size two-class dataset.

Ninety-six features have been extracted from the third level of contourlet transform in eight directions and three color channels. Also, the linear prediction of magnitude coefficients is performed and the statistical moments of log error between actual and linear predicted coefficients of contourlet transform have been determined. In this way, 96 supplementary features have been achieved.

BPSO, as a closed-loop feature selection method, is used to select the most efficient features in tandem with improvement of the detection rate. In this work, the feature set size is reduced from 192 to a reduced feature set with the size of 106–125 features as reported in Tables 1, 2 and 3, when the confusion matrices for three embedding rates (25, 10, and 5%) are reported. It is noted that the information hiding ratio or embedding rate is a well-known metric for evaluating steganalysis performance [78]. As can be seen, the number of selected features by BPSO algorithm when using RBFNN is 108, 110, and 106 features for the three mentioned embedding rates, respectively. Also, the number of selected features when using PNN classifier is 120, 118, and 112 features, respectively. It is noted that the significance of features and the detection performance depend not only on the embedding rate, but also on the image complexity [78].

Table 1 Confusion matrix using RBFNN and PNN, embedding rate = 25%

Full size table

Table 2 Confusion matrix using RBFNN and PNN, embedding rate = 10%

Full size table

Table 3 Confusion matrix using RBFNN and PNN, embedding rate = 5%

Full size table

In this study, random subsampling validation has been used as the cross-validation method. The average ratio of the training and test dataset size used in similar works, for example reported in [18, 38, 79], is used in our simulations as a criterion to make the results comparable with others. In this way, 86% of data is selected randomly for training and the remained 14% is used for test. This procedure is repeated 100 times. Due to using random subsampling validation method, the test images are also selected randomly in each of the mentioned iterations. The overall performance of proposed steganalysis method, when applying all of the test datasets of different steganography algorithms, is evaluated by employing RBFNN, PNN, and SVM classifiers to solve the two-class (stego/clean image) problem (Tables 1, 2, 3 for RBFNN and PNN, and Tables 7, 8 and 9 for SVM).

The receiver operating characteristic (ROC) curves for test data, when using RBFNN and PNN, are depicted in Figs. 5 and 6, respectively. The ROC curve shows the percentage of correctly detected stego images (true positive) versus the percentage of cover images detected as stego images (false positive). In this way, the detection accuracy is calculated using (11) with the subsequent descriptions:

$$ {\text{Accuracy}} = \left( {{\text{TP}} + {\text{TN}}} \right)/\left( {{\text{P}} + {\text{N}}} \right) $$

(11)

True positive (TP):: Ratio of stego images that correctly classified as stego,
False positive (FP):: Ratio of clean images that incorrectly classified as stego,
True negative (TN):: Ratio of clean images that correctly classified as clean,
False negative (FN):: Ratio of stego images that incorrectly classified as clean

In this work, the simulations are run on a PC powered by an Intel^® core2 Dou, 2.2 GHz CPU, and 2 GB of RAM. In another experiment, we apply the BPSO feature selection technique to the steganalysis method that has been proposed in [18] to evaluate the effect of feature selection on the performance. For SVM simulation, LIBSVM package [67] is used in our simulations. We have used ν-soft margin support vector classifier (ν-SVC) in which a parameter, ν, controls the number of support vectors and training errors. The value of ν in the range of (0, 1] is an upper bound on the fraction of training errors and a lower bound on the fraction of support vectors. We set ν to 0.2. The spread constant for the radial basis layer is set to 2 for RBFNN and 0.2 for PNN, respectively.

As shown in Tables 1, 2, 3, RBFNN classifier results in higher detection accuracy, however the training time in PNN is shorter, as compared to RBFNN. The training time with 192 features when using PNN and RBFNN is 0.2485 and 36.8456 s, respectively. The training time, when using SVM is 2.6203 s. The true positive and false positive rates for each of the steganography methods, when applying the test dataset and using RBFNN, PNN, and SVM classifiers, are reported in Tables 4, 5 and 6, respectively.

Table 4 TP and FP rates for different embedding rates-RBFNN classifier

Full size table

Table 5 TP and FP rates for different embedding rates-PNN classifier

Full size table

Table 6 TP and FP rates for different embedding rates-SVM classifier

Full size table

The confusion matrices of the proposed steganalysis method in [18], which uses similar feature vectors, and the effect of using BPSO feature selection algorithm for three embedding rates are reported in Tables 7, 8, 9, respectively. The ROC curve, when using BPSO + SVM classifier, is depicted for different embedding rates in Fig. 7, too. As shown in Tables 7, 8, 9, in spite of about 30% reduction in the size of feature set, the detection accuracy is improved in different embedding rate conditions (Fig. 8).

Table 7 Confusion matrix when using SVM [18] and BPSO + SVM (proposed), embedding rate = 25%

Full size table

Table 8 Confusion matrix when using SVM [18] and BPSO + SVM (proposed), embedding rate = 10%

Full size table

Table 9 Confusion matrix when using SVM [18] and BPSO + SVM (proposed), embedding rate = 5%

Full size table

5 Conclusion

Feature extraction, feature selection, and classification are the three stages of steganalysis systems. In this paper, a steganalysis method has been proposed which is based on the features from the contourlet transform domain. Contourlet has the ability of capturing smooth contours in images. As compared to some common transforms, such as wavelet, directionality and anisotropy are the important advantages of contourlet. In this work, the candidate features consist of statistical moments of contourlet coefficients and statistical moments of log error between actual and linear predicted coefficients of contourlet transform in the third level and eight directions. The number of features has been reduced by BPSO feature selection technique. So, the computational load has been reduced and the classification accuracy of stego/clean images has been improved, as well. Radial basis neural networks, RBFNN and PNN, and also SVM have been used as classification tools in this work to distinguish stego from clean images.

Experimental results have shown that in spite of low embedding rate, the detection accuracy of the proposed method is more than 80%, when using SVM or RBFNN which is equipped by BPSO.

The investigations have shown that using BPSO feature selection algorithm in this classification problem has led to feature-set size reduction by 30% and higher detection accuracies have been achieved, as well. In addition, it is concluded that the performance of ν-soft margin support vector classifier (ν-SVC) is better than RBFNN, because its training time is shorter than RBFNN. It is noted that the training time of SVM and RBFNN was 2.6203 and 36.8456 s, respectively. However, by comparing the results reported in Tables 4 and 6, we conclude that the classification rate of RBFNN is slightly better than SVM in high embedding rates for Jsteg, OutGuess, and JPHS steganography algorithms. In this way, PNN has the shortest training time in this application; however its classification rate is not competitive. In our future work, we will investigate the effect of using supplementary features using contourlet coefficients.

References

Petitcolas FAP, Anderson RJ, Kuhn MG (1999) Information hiding-a survey. Proc IEEE (special issue on protection of multimedia content) 87:1062–1078
Google Scholar
Sharp T (2001) An implementation of key-based digital signal steganography. In: The proceedings of the 4th international workshop on information hiding, vol 2137 of Springer LNCS, pp 13–26
Zhang T, Li W, Zhang Y, Zheng E, Ping X (2010) Steganalysis of LSB matching based on statistical modeling of pixel difference distributions. Inf Sci 180:4685–4694
Article Google Scholar
Westfeld A (2001) F5- a steganographic algorithm: high capacity despite better steganalysis. In: The proceedings of the 4th international workshop on information hiding, vol 2137 of Springer LNCS, pp 289–302
Provos N (2001) Defending against statistical steganalysis. In: The proceedings of the 10th USENIX security symposium, pp 24–36
Provos J, Goljan M, Du R (2001) Detecting LSB steganography in color and gray-scale images. Multimedia IEEE 8:22–28
Google Scholar
Hetzl S (2003) Steghide Software http://steghide.sourceforge.net/ Accessed 28 Dec 2009
Solanki K, Sarkar A, Manjunath BS (2008) YASS: yet another steganographic scheme that resists blind steganalysis. In: The proceedings of the 9th international workshop on information hiding, vol 4567 of Springer LNCS, pp 16–31
Wang H, Wang S (2004) Cyber warfare: steganography vs. steganalysis. Commun ACM 47:76–82
Article Google Scholar
Nissar A, Mir AH (2010) Classification of steganalysis techniques: a study. Digit Signal Process 20:1758–1770
Article Google Scholar
Shi Y, Chen C, Chen W (2006) A Markov process based approach to effective attacking JPEG steganography. In: The proceedings of the 8th international workshop on information hiding, pp 249–264
Wang Y, Moulin P (2007) Optimized feature extraction for learning-based image steganalysis. IEEE Trans Inform Forensics Secur 2:31–45
Article Google Scholar
Pevny T, Fridrich J (2007) Merging Markov and DCT features for multiclass JPEG steganalysis. SPIE-IS & T Electronic Imaging 650503:1–13
Fridrich J, Goljan M, Du R (2001) Reliable detection of LSB steganography in color and grayscale images. In: The proceedings of the ACM workshop on multimedia security, pp 27–30
Goljan M, Fridrich J, Holotyak T (2006) New blind steganalysis and its implications. In: The proceedings of the SPIE 6072, pp 1–13
Ker AD, Lubenko I (2009) Feature reduction and payload location with WAM steganalysis. In: The proceedings of the SPIE 7254, pp 0A01–0A13
Luo XY, Wang DS, Wang P, Liu FL (2008) A review on blind detection for image steganography. Signal Process 88:2138–2157
Article MATH Google Scholar
Sajedi H, Jamzad M (2010) CBS: contourlet-based steganalysis method. J Signal Process Syst 61:367–373
Article Google Scholar
Po DDY, Do MN (2006) Directional multiscale modeling of images using the contourlet transform. IEEE Trans Image Process 15:1610–1620
Article MathSciNet Google Scholar
Sheikhan M, Mohammadi N (2011) Neural-based electricity load forecasting using hybrid of GA and ACO for feature selection. Neural Comput Appl (Published online: 1 May 2011, doi:10.1007/s00521-011-0599-1)
Lee S, Soak S, Oh S, Pedrycz W, Jeon M (2008) Modified binary particle swarm optimization. Progress Natural Sci 18:1161–1166
Article MathSciNet Google Scholar
Lie WN, Lin GS (2005) A feature-based classification technique for blind image steganalysis. IEEE Trans Multimedia 7:1007–1020
Article Google Scholar
Xuan GR, Shi YQ, Gao JJ, Zou DK, Yang CY, Zhang ZP, Chai PQ, Chen CH, Chen W (2005) Steganalysis based on multiple features formed by statistical moments of wavelet characteristic functions. In: The proceedings of the 7th international information hiding workshop, vol 3727 of Springer LNCS, pp 262–277
Zhou Z, Hui M (2009) Steganalysis for Markov feature of difference array in DCT domain. In: The proceedings of the 6th international conference on fuzzy systems and knowledge discovery, vol 7, pp 581–584
Lyu S, Farid H (2006) Steganalysis using higher-order image statistics. IEEE Trans Inform Forensics Secur 1:111–119
Article Google Scholar
Chamorro AGH, Miyatake MN (2010) A new methodology of image steganalysis including for JPEG steganography. In: The proceedings of the international conference on electronics, robotics and automotive mechanics, pp 434–438
Lin J-Q, Zhong S-P (2009) JPEG image steganalysis method based on binary similarity measures. In: The proceedings of the international conference on machine learning and cybernetics, vol 4, pp 2238–2243
Bhat VH, Krishna S, Shenoy PD, Venugopal KR, Patnaik LM (2010) HUBFIRE-A multi-class SVM based JPEG steganalysis using HBCL statistics and FR index. In: The proceedings of the international conference on security and cryptography, pp 1–6
Yi X, Wang YA (2009) An investigation of genetic algorithm on steganalysis techniques. In: The proceedings of the 5th international conference on intelligent information hiding and multimedia signal processing, pp 1118–1121
Zhi-Min He Ng WWY, Chan PPK, Yeung DS (2010) JPEG steganalysis based on class-wise non-principal components analysis and multi-directional Markov model. In: The proceedings of the international conference on machine learning and cybernetics, vol 1, pp 500–503
Cho S, Wang J, Kuo C-CJ, Cha B-H (2010) Block-based image steganalysis for a multi-classifier. In: The proceedings of the international conference on multimedia and expo, pp 1457–1462
Bayram S, Dirik AE, Sencar HT, Memon N (2010) An ensemble of classifiers approach to steganalysis. In: The proceedings of the 20th international conference on pattern recognition, pp 4376–4379
Asadi N, Jamzad M, Sajedi H (2008) Improvements of image-steganalysis using boosted combinatorial classifiers and Gaussian high pass filtering. In: The proceedings of the international conference on intelligent information hiding and multimedia signal processing, pp 1508–1511
Luo P, Su Y (2010) Research on simulated annealing clustering algorithm in the steganalysis of image based on the one-class support vector machine. In: The proceedings of the international conference on computer application and system modeling, vol 14, pp 446–450
Wang Y, Liu J, Zhang W, Lian S (2010) Reliable JPEG steganalysis based on multi-directional correlations. Signal Process Image Commun 25:577–587
Article Google Scholar
Sabeti V, Samavi S, Mahdavi M, Shirani S (2010) Steganalysis and payload estimation of embedding in pixel differences using neural networks. Pattern Recogn 43:405–415
Article MATH Google Scholar
Liu Q, Sung AH, Qiao M, Chen Z, Ribeiro B (2010) An improved approach to steganalysis of JPEG images. Inf Sci 180:1643–1655
Article Google Scholar
Liu Q, Sung AH, Chen Z, Xu J (2008) Feature mining and pattern classification for steganalysis of LSB matching steganography in grayscale images. Pattern Recogn 41:56–66
Article MATH Google Scholar
Raval MS (2009) A secure steganographic technique for blind steganalysis resistance. In: The proceedings of the 7th international conference on advances in pattern recognition, pp 25–28
Wahab AW, Briffa JA, Schaathun HG, Ho ATS (2009) Conditional probability based steganalysis for JPEG steganography. In: The proceedings of the international conference on signal processing systems, pp 205–209
Yamini B, Sabitha R (2010) Steganalytic attack for an adaptive steganography using support vector machine. In: The proceedings of the international conference on emerging trends in robotics and communication technologies, pp 56–58
Do MN, Vetterli M (2005) The contourlet transform: an efficient directional multiresolution image representation. IEEE Trans Image Process 14:2091–2106
Article MathSciNet Google Scholar
Zhao Q, Cao J, Hu Y (2009) 3-Joint optimization of feature selection and parameters for multi-class SVM in skin symptomatic recognition. In: The proceedings of the international conference on artificial intelligence and computational intelligence, vol 1, pp 407–411
Tian J, Li M, Chen F (2010) Dual-population based coevolutionary algorithm for designing RBFNN with feature selection. Expert Syst Appl 37:6904–6918
Article Google Scholar
Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer, Boston
Book MATH Google Scholar
Perez CA, Cament LA, Castillo LE (2011) Methodological improvement on local Gabor face recognition based on feature selection and enhanced Borda count. Pattern Recogn 44:951–963
Article Google Scholar
Gurwicz Y, Yehezkel R, Lachover B (2011) Multiclass object classification for real-time video surveillance systems. Pattern Recogn Lett 32:805–815
Article Google Scholar
Tian D, Zeng X, Keane J (2011) Core-generating approximate minimum entropy discretization for rough set feature selection in pattern classification. Int J Approx Reason 52:863–880
Article Google Scholar
Bontempi G (2007) A blocking strategy to improve gene selection for classification of gene expression data. IEEE/ACM Trans Comput Biol Bioinform 4:293–300
Article Google Scholar
Chang C-Y, Chen S-J, Tsai M-F (2010) Application of support-vector-machine-based method for feature selection and classification of thyroid nodules in ultrasound images. Pattern Recogn 43:3494–3506
Article MATH Google Scholar
Lim CP, Wang SL, Tan KS, Navarro J, Jain LC (2010) Use of the circle segments visualization technique for neural network feature selection and analysis. Neurocomputing 73:613–621
Article Google Scholar
Cheng HD, Shan J, Ju W, Guo Y, Zhang L (2010) Automated breast cancer detection and classification using ultrasound images: a survey. Pattern Recogn 43:299–317
Article MATH Google Scholar
Tsang C-H, Kwong S, Wang H (2007) Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection. Pattern Recogn 40:2373–2391
Article MATH Google Scholar
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2011) Feature selection and classification in multiple class datasets: an application to KDD Cup 99 dataset. Expert Syst Appl 38:5947–5957
Article Google Scholar
Hua J, Tembe WD, Dougherty ER (2009) Performance of feature selection methods in the classification of high-dimension data. Pattern Recogn 42:409–424
Article MATH Google Scholar
Heikkinen V, Tokola T, Parkkinen J, Korpela I, Jaaskelainen T (2010) Simulated multispectral imagery for tree species classification using support vector machines. IEEE Trans Geosci Remote Sens 48:1355–1364
Article Google Scholar
Puig D, Angel Garcia M, Melendez J (2010) Application-independent feature selection for texture classification. Pattern Recogn 43:3282–3297
Article MATH Google Scholar
Ruvolo P, Fasel I, Movellan JR (2010) A learning approach to hierarchical feature selection and aggregation for audio classification. Pattern Recogn Lett 31:1535–1542
Article Google Scholar
Tan KC, Teoh EJ, Yu Q, Goh KC (2009) A hybrid evolutionary algorithm for attribute selection in data mining. Expert Syst Appl 36:8616–8630
Article Google Scholar
Casale S, Russo A, Serrano S (2007) Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Commun 49:801–810
Article Google Scholar
Gharavian D, Sheikhan M, Nazerieh AR, Garoucy S (2011) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput Appl (Published online: 28 May 2011, doi:10.1007/s00521-011-0643-1)
Wang S, Li D, Song X, Wei Y, Li H (2011) A feature selection method based on improved Fisher’s discriminant ratio for text sentiment classification. Expert Syst Appl 38:8696–8702
Article Google Scholar
Catal C, Diri B (2009) Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf Sci 179:1040–1058
Article Google Scholar
Zhao H, Sinha AP, Ge W (2009) Effects of feature construction on classification performance: an empirical study in bank failure prediction. Expert Syst Appl 36:2633–2644
Article Google Scholar
Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: The proceedings of the international conference on systems, man and cybernetics, pp 4104–4108
Babaoglu I, Findik O, Ülker E (2010) A comparison of feature selection models utilizing binary particle swarm optimization and genetic algorithm in determining coronary artery disease using support vector machine. Expert Syst Appl 37:3177–3183
Article Google Scholar
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines (http://www.csie.ntu.edu.tw/~cjlin/libsvm) Accessed 7 Mar 2010
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: The proceedings of the 5th annual ACM workshop on COLT, pp 144–152
Shihong Y, Ping L, Peiyi H (2003) SVM classification: its content and challenges. Appl Math J Chinese Univ Ser B 18:332–342
Article MathSciNet MATH Google Scholar
Moody J (1989) Fast learning in networks of locally-tuned processing units. Neural Comput 1:281–294
Article Google Scholar
Haykin S (1994) Neural networks: a comprehensive foundation. Macmillan College Publishing Company, New York
MATH Google Scholar
Specht DF (1990) Probabilistic neural networks. Neural Netw 3:109–118
Article Google Scholar
Schaefer G, Stich M (2004) UCID: An uncompressed colour image database. Proc. SPIE, Storage and Retrieval Methods and Application for Multimedia, San Jose, CA 427–480 (http://vision.cs.aston.ac.uk/datasets/UCID/ucid.html) Accessed 20 Nov 2009
Upham D, Jsteg. Software (ftp://ftp.funet.fi/pub/crypt/steganography) Accessed 6 May 2009
Provos N, Outguess Software (www.outguess.org) Accessed 3 Dec 2009
Sallee P, Model-Based Steganography (http:\\www.philsallee.com\mbsteg\index.html) Accessed 3 Dec 2009
Latham A, JPHS software (http://linux01.gwdg.de/~alatham/stego.html) Accessed 3 Dec 2009
Liu Q, Sung AH, Ribeiro B, Wei M, Chen Z, Xu J (2008) Image complexity and feature mining for steganalysis of least significant bit matching steganography. Inf Sci 178:21–36
Article Google Scholar
Geetha S, Sivatha Sindhu SS, Kamaraj N (2009) Blind image steganalysis based on content independent statistical measures maximizing the specificity and sensitivity of the system. Comput Secur 28:683–697
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering, EE Department, Islamic Azad University, South Tehran Branch, P.O. Box 11365-4435, Tehran, Iran
Mansour Sheikhan & Mansoureh Pezhmanpour
Multimedia Systems Group, IT Department, Iran Telecom Research Center, Tehran, Iran
M. Shahram Moin

Authors

Mansour Sheikhan
View author publications
You can also search for this author in PubMed Google Scholar
Mansoureh Pezhmanpour
View author publications
You can also search for this author in PubMed Google Scholar
M. Shahram Moin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mansour Sheikhan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sheikhan, M., Pezhmanpour, M. & Moin, M.S. Improved contourlet-based steganalysis using binary particle swarm optimization and radial basis neural networks. Neural Comput & Applic 21, 1717–1728 (2012). https://doi.org/10.1007/s00521-011-0729-9

Download citation

Received: 07 September 2010
Accepted: 03 August 2011
Published: 19 August 2011
Issue Date: October 2012
DOI: https://doi.org/10.1007/s00521-011-0729-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Improved contourlet-based steganalysis using binary particle swarm optimization and radial basis neural networks

Abstract

Similar content being viewed by others

Optimized support vector neural network and contourlet transform for image steganography

Integrating machine learning and features extraction for practical reliable color images steganalysis classification

Is blind image steganalysis practical using feature-based classification?

1 Introduction

2 Related works