1 Introduction

The application of digital imaging has been increased in recent years due to the availability of low-cost and high resolution digital cameras, and user friendly digital imaging software. This has considerable benefits in areas like electronic and print media, world wide web, security and surveillance, and insurance industry, to name a few. However, although user friendly image editing tools make our life easier, they also raise serious authentication issues due to their mishandling by some people. It has never been easier to manipulate images in order to gain illegal advantages or to make false propaganda using forged images. Therefore, correctly detecting image forgeries is of growing interest in the research community.

Most image forgeries are performed using a copy-move procedure. In this procedure, some part in an image is copied and pasted to either another part of the same image or a different image. If the copy-move procedure involves the same image, then it is called copy-move forgery; if it involves more than one images, it is referred to as splicing. Before pasting the copied part, it can geometrically be transformed by rotation, scaling, etc. to fit in the location being moved to. To hide the traces of forgery, different types of post- processing are applied to the forged region such as blurring, adding noise, etc.

Research in image forgery detection mainly started early in 2000 [1]. Many techniques have been reported since then to detect copy-move forgery or splicing. These techniques can be divided mainly in two categories: active and passive (or blind) [4]. Active techniques, assume that the image contains a watermark; in this case, forgery is detected if the extracted watermark does not match the original watermark. However, this technique has limited applications because in most the cases we do not have information about the watermark, if it exists at all. This limitation has led to the development of blind techniques that do not depend on an explicit watermark. The output of forgery detection could be of two types: (i) classifying an image as authentic or forged (no localization), and (ii) localizing the forged region, if the image is not authentic. In this paper, we propose a blind technique with the goal of classifying a test image as authentic or forged.

Most forgery detection methods are block-based [1, 2]. Fridrich et al. [1] and Huang et al. [2] used the discrete cosine transform (DCT) coefficients to create a feature vector from the blocks of an image and applied block matching. Multi-resolution techniques, such as the discrete wavelet transform (DWT), has been utilized in several other methods [3, 4]. An alternative to block matching is using SIFT features for matching [5]. In image splicing detection, most works have been evaluated using the Columbia dataset [6] and the CASIA dataset [7]. Ng et al. [8] used higher order moments of the image spectrum to detect image splicing, while geometric invariants and the camera response function were used in [9]. Shi et al. [10] proposed statistical features based on 1D and 2D moments, and transition probability features based on Markov chain in the DCT domain for image splicing detection. In CASIA v2.0 database, their method has achieved 84.86 % accuracy. Later, He et al. [11] improved the method by combining transition probability features in the DCT and DWT domains. For classification, they used support vector machine (SVM) with recursive feature elimination (RFE). Their method achieved 89.76 % accuracy on the CASIA v2.0 database. The transition probability features extracted from the Cb chrominance channel of an edge-thresholded image were proposed in [12]. Using the CASIA v2.0 database, the method achieved 95.6 % accuracy, however, they did not use the full database. The same method achieved 89.23 % accuracy using the Columbia database. In a recent method [13], a chroma-like channel was designed for image splicing detection, improving the performance of [12] to 93.14 %.

All of the methods reviewed above, except [11], do not fully utilize the scale and orientation information in an image. Even in [11], where DWT coefficients were used, DWT filtering was only performed in the horizontal and vertical directions. In this paper, an image forgery detection method based on SPT and LBP is proposed. SPT involves filtering at different scales and orientations. In our method, SPT is being applied on the chrominance components of a color image and LBP features are extracted from the resulted subbands. The LBP histograms of all the subbands are concatenated to form a feature vector which is then given to an SVM for classification. Our objective is to determine whether a given image is forged or not rather than localizing the forgery in the image.

The rest of this paper is organized as follows: In Sect. 2, the proposed method is described. Section 3 presents and discusses our experimental results. Finally, Sect. 4 contains our conclusions and directions for future work.

2 Proposed method

Figure 1 shows a block diagram of the proposed method for image forgery detection. First, given a color image, we convert it into the YCbCr chrominance space, where Y is the luminance, and Cb and Cr are the chrominance components. Cb and Cr are the blue difference and the red difference, respectively. The luminance channel is more sensitive to the human eye than the chrominance channels; however, it has been shown that the chrominance channels are more suitable for forgery detection [12, 13]. In general, the content of an image is too strong to hide tampering clues. The luminance channel mostly describes the image content, while the chrominance channels emphasize the weak signal content of the image. The edge irregularity caused by image tampering can be noticed in chrominance channels. Therefore, we concentrate only on the Cb and Cr channels. The usage of red, green, and blue channels could be another option, but it does not consider the relationship between the channels. Also, there are many other color spaces such as HSV, XYZ, L*a*b*, CMY, etc. that could be utilized, however, the emphasis of this current work is to extract appropriate features (steerable filters and LBP) for image forgery detection. In the second step, SPT is applied to each chrominance component; the output is a number of subbands that are translation and rotation invariant. In the third step, we perform feature extraction by applying LBP on each SPT subband. LBP is a powerful local texture descriptor, which has shown impressive performance in the literature [16]. LBP histograms from different subbands are concatenated to form a feature vector. An optional feature selection step is applied next to reduce the number of features, as well as to enhance the performance of the system. In the final step, SVM is utilized to determine if the input image is forged or not.

Fig. 1
figure 1

Block diagram of the proposed method for image forgery detection

2.1 Steerable pyramid transform

SPT is a powerful linear multi-scale, multi-orientation image decomposition technique. It was developed to overcome the limitations of orthogonal separable wavelet decompositions. The steerable pyramid analysis functions are dilated and rotated versions of a single directional wavelet [14, 15]. SPT is used here because the trace of the forgery can be found in different scales or orientations of an image. In the zero scale (\({S}_0\)) of SPT, a lowpass filter \(L_0\) and a highpass filter \(H_0\) are applied to the image. In the first scale (\(S_1\)), \(L_0\) is decomposed into k-oriented \((2\pi /{k})\) subbands using directional operators \(({B}_{k})\) and a lowpass subband (\(L_1\)). In the second scale (\(S_2\)), \(L_1\) (sub-sampled by a factor of 2) is again divided into k-oriented subbands and a lowpass subband (\(L_2\)). This procedure continues recursively until a required number of scales have been obtained. In order to avoid aliasing during sub-sampling, the constraint on the \(L_1\) filter is as follows:

$$\begin{aligned} L_1 (\omega )=0,\text { for }|\omega |>\frac{\pi }{2}. \end{aligned}$$
(1)

To avoid amplitude distortion, the transfer function (frequency response) of the system should be unity as follows:

$$\begin{aligned} |H_0 (\omega )|^{2}+|L_0 (\omega )|^{2}\left[ {|L_1 (\omega )|^{2}+\sum _{k=0}^K {|B_k (\omega )|^{2}} } \right] =1. \end{aligned}$$
(2)

The relationship between two successive low pass filters in terms of frequency is as follows:

$$\begin{aligned} L_0 (\omega )=L_1 (\omega /2). \end{aligned}$$
(3)

Figure 2 illustrates the transfer functions of the filters \(H_0\), \(L_0\), four oriented (k = 4) bandpass filters (\(B_0\), \(B_1\), \(B_2\), \(B_3\)), and \(L_1\). Here, a 3-scale, 4-orientation SPT is used to represent an image by a total of 12 subbands with a wide range of scale and orientation variety. In addition, we use the residual lowest subband as well as the residual highest subband (i.e., total number of subbands used is 14). Figure 3 shows the block diagram of the SPT system.

Fig. 2
figure 2

Transfer functions (range \(-\pi \) to \(\pi )\) of the filters used in the steerable pyramid transform. First row, \(H_0\) and \(L_0\); second and third row, four bandpass filters, \(B_0\), \(B_1\), \(B_2\), and \(B_3\); fourth row, \(L_1\)

Fig. 3
figure 3

Block diagram of the SPT system used in the proposed method

The motivation of using translation and rotation invariant SPT in the proposed method is that the SPT is a multi-resolution technique, where the given image is decomposed into subbands having different resolutions (scales) and frequencies, and various orientations. The trace of forgery, which cannot be noticed in the test image, can be found in these subbands. Though another multi-resolution technique, DWT, is used before in image forgery detection, there is no orientation filtering involved in DWT (it has only horizontal and vertical filtering).

2.2 Local binary pattern

After decomposing the chrominance components of an image into several SPT subbands, LBP is applied on each subband to extract a set of features. LBP is a texture descriptor that labels each pixel in the image by thresholding the neighborhood pixels with the center pixel and considering the result as a binary number (see Fig. 4). Then, the texture can be described by the histogram of these label values [16]. This is called the basic LBP operator, where it is calculated in a rectangular window. LBP can also be extracted in a circular neighborhood (P, R), where P is the number of neighbors and R is the radius of the neighborhood. In this work, we have experimented both with the basic and circular LBP using P = 8 and R = 1. The normalized LBP histogram is used as a feature vector for the corresponding subband. The histogram has 256 bins corresponding to 256 gray values. The LBP histograms from all the subbands are concatenated to produce the final feature vector.

Fig. 4
figure 4

The basic LBP operator

The LBP is used in the proposed method for two reasons. First, the number of coefficients from all the subbands of the SPT is prohibitively large to feed into a classifier. For example, if an image size is \(200 \times 200\), then the SPT illustrated in Fig. 3 will produce \(200 \times 200\) coefficients at \(H_0\) subband, four \(200 \times 200\) coefficients at scale 1, four \(100 \times 100\) coefficients at scale 2, four \(50 \times 50\) coefficients at scale 3, and \(25 \times 25\) coefficients at the lowest frequency subband. Second, we are interested in texture, because when forging an image, the original texture is distorted. Therefore, the texture pattern can be a good indicator of forgery detection. The LBP (which is a good texture descriptor) histogram can encode texture differences at different scales and orientations of the steerable pyramid transformed image.

2.3 Feature selection and classification

Feature selection is very important because of the high dimensionality and complicated distribution of data. The length of the feature vector in our case is more than 3,500 while using all subbands of the SPT and the basic LBP. A significant number of the features may not be important, and thereby can be called irrelevant, for classification. Eliminating such irrelevant features can reduce system complexity, and data analysis and processing time. Many feature selection methods can be used such as Fisher, local learning based (LLB), RFE, zero norm minimization and others.

In the fourth step of the proposed method, a combination of two different data reduction techniques (i.e., \(L_0\)-norm [17] and LLB [18]), is used for feature selection. \(L_0\)-norm ranks the features based on class separability constraints, while LLB removes features that contain redundant information. In our implementation, \(L_0\)-norm is applied first, followed by the LLB. The threshold for LLB was set to \(10^{-10}\). It should be mentioned that besides \(L_0\)-norm and LLB, we investigated other feature selection algorithms such as the Fisher discrimination ratio and \(L_1\)-norm. However, we found the performances of \(L_0\)-norm and LLB to be better than the other two algorithms. We have also found that their combination works better than when applied individually. Therefore, we adopted combining \(L_0\)-norm with LLB in the proposed method.

In the final step of the proposed method, we apply SVM classification with an radial basis function (RBF) kernel to classify an image as authentic or forged. SVM is widely used for data classification and it is known for its high prediction capabilities in many applications, especially for binary classes. RBF kernel was chosen for SVM as it is more general than other kernels (especially linear one) and usually it produces better accuracy and has less restriction than other kernels.

3 Experiments

In our experiments, we used the following three databases: CASIA v1.0 [7], CASIA v2.0 [7], and Columbia color DVMM [6]. First, we performed an extensive set of experiments to evaluate the proposed method using the CASIA v1.0 database. Then, we performed additional experiments using the CASIA v2.0 and Columbia color databases. The CASIA v1.0 dataset contains 800 authentic and 921 forged color images of which 459 are copy-move forged, and the remaining are spliced. Different geometric transforms, such as scaling and rotation, have been applied on the forged images. All the images have a size of \(384\times 256\) pixels, and they are in JPEG format. The CASIA v2.0 database is an extension of the CASIA v1.0 database. In particular, it consists of 7,491 authentic and 5,123 forged color images in JPEG, BMP, and TIFF format, where image sizes vary from \(240\times 160\) to \(900 \times 600\) pixels. The Columbia color image database consists of 183 authentic and 180 spliced images in TIFF format. The image size is \(1{,}152 \times 768\), and no post-processing was applied to the forged image. All the forged images are spliced images.

Image features were extracted using the chrominance channels, SPT and LBP. The LIBSVM toolbox [19] was used for classification. The optimal parameter values of the SVM (\(\sigma \) and c) were automatically set by an intensive grid search process using the training set after applying feature selection. The performance of the proposed method is given in terms of accuracy, sensitivity, specificity, and area under curve (AUC) averaged over a tenfold cross-validation. In tenfold cross-validation, the authentic images set and the forged images set are randomly divided into ten equal groups each. In each iteration, nine groups each from the authentic images and the forged images are used for training, while the remaining are for testing. Therefore, at the end of ten iterations, all the ten groups are tested. There is no overlapping between the training set and the testing set in one iteration. Feature selection and the optimization of the SVM parameters are done with the training set. The final accuracy is obtained by averaging ten accuracies of the folds.

To assess the performance of the proposed method, we performed a large number of experiments by considering many combinations of different image representations (grayscale, Y, Cr and Cb), SPT subbands and LBP parameters. In the case of LBP, the basic LBP and the LBP with ‘u2’ parameter (maximum two bits transitions) performed the best; however, the results reported here are with the basic LBP.

3.1 Effect of different SPT subbands

In this section, we investigate the effect of different SPT subbands on image forgery detection. We use CASIA v1.0 database for this investigation. The following (a)–(f) results are without feature selection.

(a):

Effect of SPT scale assuming a single orientation

In this experiment, we studied the performance of three different subbands, at different scales, but having the same orientation (first orientation, \(O_0\)). The feature vector length in this case was 256 (i.e., the number of bins in the LBP histogram). Figure 5a shows the performance of the proposed method assuming different channels. In all channels, the first scale (\(S_1\)) has the highest accuracy. The accuracy of the first scale in the Cr channel is 87.6 %. The accuracy of the same channel decreases to 78.6 % in the second scale, followed by 66.8 % in the third scale. The same pattern occurs in the other channels. The performances of Cr and Cb are comparable, and they are better than the performance in Y and gray.

(b):

Effect of SPT orientations assuming a single scale

In this experiment, we investigated the performance of four different subbands having four different orientations (\(O_0\), \(O_1\), \(O_2\), and \(O_3\)), but at the same scale (\(S_1\)). Figure 5b shows the results; in both chrominance channels, the first orientation has the highest accuracy, which is 87.6 % for Cr and 85.8 % for Cb. However, there is no clear trend observed in the performance of different orientations.

(c):

Effect of each SPT scale

In this experiment, we studied the effect of each SPT scale on the performance of the proposed method. The histograms of the four oriented subbands within each scale are concatenated to form a feature vector of length \(1{,}024 \,(= 4\times 256)\). Figure 5c shows the accuracy of the three scales in different channels. The best accuracy, which is 90.4 %, was obtained by the \(S_1\) scale in Cr channel. It is noteworthy that using the Cr channel, the same scale with a single orientation (\(O_0\)) has an accuracy of 87.6 %. Based on this observation, we conclude that all four orientation subbands are necessary for improving accuracy.

(d):

Effect of each SPT orientation

In this experiment, the histograms of the three scales at each orientation were concatenated to form a feature vector of length \(768\, (=3\times 256)\). Figure 5d shows the accuracyof the four orientations in different channels. Orientation 1 (\(O_1\)) and orientation 3 (\(O_3\)), gave the best results. . Cr and Cb yielded an accuracy of 87.3 and 86.3 %, respectively, in \(O_1\).

(e):

Effect of combining all subbands

In this experiment, we studied the performance of combining all the SPT subbands by concatenating the corresponding LBP histograms. The length of the feature vector in this case was \(3,584 \,(=14\times 256)\). The results are shown in Fig. 6. The accuracies of the four channels were 91.16 % for Cr, 89.19 % for Cb, 63.9 % for Y, and 63.6 % for gray. It is clear that the combination of all subbands achieves higher accuracy than using the individual subband for each channel. Moreover, Cr and Cb have better performance compared to Y and gray.

Fig. 5
figure 5

Image forgery detection accuracies (%) for different SPT scales and orientations subbands

Fig. 6
figure 6

Accuracy (%) of the proposed method without feature selection in different channels using the CASIA v1.0 database

3.1.1 Effect of feature selection

Table 1 shows the performance of the proposed method using feature selection on the CASIA v1.0 database. As mentioned earlier, the features are selected using a combination of \(L_0\)-norm and LLB. Using feature selection, the number of features was reduced from 3,584 to 480, on average. Our experimental results indicate that the chrominance channels performed better than the luminance channel or gray. For example, the Cr channel had the best mean accuracy of 94.89 % (standard deviation, SD = 1.12), followed by 94.24 % accuracy using the Cb channel. The sensitivity, specificity, and AUC were also high for these two channels. On the other hand, Y and gray had only 66.74 and 64.65 % accuracy, respectively. In the case of Cr, feature selection improved the accuracy from 91.16 % (see Fig. 6) to 94.89 %. Figure 7 shows the ROC curve for the four channels using feature selection. We have also analyzed the performance of individual SPT subbands using feature selection. Table 2 shows the subband contributions (%) for the Cr and Cb channels. As it can be observed, the highest frequency subband contributes the most (33.54 % for Cr and 34.37 % for Cb) as a single subband. In terms of scale contribution, scale 1 contributes the most (46.04 % for Cr and 45.84 % for Cb), while scale 2 and scale 3 contribute the least.

Fig. 7
figure 7

ROC curves for the four channels using feature selection and the CASIA v1.0 database

Table 1 Performance of the proposed method using feature selection and the CASIA v1.0 database
Table 2 Individual SPT subband contribution (%) using feature selection and the CASIA v1.0 database

3.1.2 Investigating performance on splicing and copy-move forgery separately

We have performed additional experiments to investigate the performance of the proposed method on splicing and copy-move forgery separately. In the ‘Sp’ folder of the CASIA v1.0 database, two types of forgeries can be found: splicing (involving two images) and copy-move (involving one image). While performing experiments on splicing detection, we removed all the copy-move forged images from the dataset; similarly, we excluded all the spliced images for the copy-move detection experiments. The results are reported in Table 3 for the chrominance channels only, using feature selection. As it can be observed, the accuracy of splicing detection (95.09 % for Cr) and that of copy-move detection (95.21 % for Cr) were very close, indicating that the proposed method is not biased towards any specific type of image forgery.

Table 3 Performance of the proposed method on splicing and copy-move detection, using the CASIA v1.0 database

Table 4 shows the performance of the proposed method in CASIA v2.0 database. As we have found that the Y channel and gray scale do not give satisfactory results, we have not included them in the table. The proposed method achieved 97.33 % accuracy using the Cr and Cb channels without feature selection. When using feature selection, performance did not increase; however, it feature selection decreases the number of features to around one-tenth. Figure 8 shows the ROC curve for the two chrominance channels using feature selection.

Table 4 Performance of the proposed method using feature selection on the CASIA v2.0 database
Fig. 8
figure 8

ROC curves of the proposed method with feature selection for the two chrominance channels on the CASIA v2.0 database

We have also performed experiments with different types of forgery separately to see whether the proposed method is biased to any particular type or not. Figure 9 shows the accuracy (%) using the Cr and Cb channels with feature selection with CASIA v2.0. Apart from splicing and copy-move detection, we have also performed experiments assuming geometric transformations, namely, rotation, scaling, deformation, and no transformation (copy-move forgery without any geometric transformations). A similar approach as described in the previous section was utilized, that is, in the case of splicing detection, we removed all the copy-move forged images from the dataset; in the case of copy-move detection, we removed all the spliced images from the dataset. The other four experiments (i.e., assuming geometric transformation) were performed in the same manner. In all cases, accuracy ranges between 93.22 and 97.67 %, where the best accuracy was obtained in the case of splicing detection using the Cr channel. Therefore, we can conclude that the proposed method works well for detecting both splicing and copy-move forgery with and without geometric transformations.

Fig. 9
figure 9

Accuracy (%) of the proposed method on different types of forgeries on CASIA v2.0 database. Data labels are shown only for the Cr channel

Table 5 shows the accuracy (%) of the proposed method with and without feature selection using Columbia color database. The Cb channel performs better than the Cr channel on this database. The proposed method, after applying feature selection, yields 96.39 % accuracy using the Cb channel, and 94.17 % accuracy using the Cr channel. It is interesting to note (not shown here) that all selected features came from the residual highest frequency subband of SPT.

Table 5 Performance of the proposed method using feature selection on the Columbia color database

3.2 Comparisons with other methods

The performance of the proposed method was also compared to that of some other recent methods. Table 6 shows the comparisons using the three databases used in our experiments. The results of the other methods were obtained from the corresponding papers. Only the best results of these methods are reported in the table. The method in [20] involves a modified run-length run-number technique on the chrominance channels, while the methods in [10, 11, 13] were described briefly in Sect. 1. According to our knowledge, these methods have achieved the best results so far on these databases. However, the authors of these methods reported results on only one database, either CASIA v1.0, CASIA v2.0, or Columbia color; and also no implementation code was provided. Therefore, in Table 6, we show results of these four methods for the corresponding databases only. As Table 6 illustrates, the proposed method outperforms the methods of [10, 11] on CASIA v2.0 and the method of [13] in the Columbia color database; the performance of the proposed method is slightly better than that of the method in [20].

Table 6 Comparison of various methods on three databases

4 Conclusion

In this paper, we proposed a new method for image forgery detection method based on SPT and LBP. The method was extensively evaluated on three publicly available databases. The best accuracy of the proposed method was 94.89 % on the CASIA v1.0 database, 97.33 % on the CASIA v2.0 database, and 96.39 % on the Columbia color image database. These accuracies were significantly higher than those obtained by the other state of the art methods on the same databases. Our experiments revealed that (a) the chrominance channels are better suited for image forgery detection than the luminance channel or gray scale. The performances of the chrominance channels were comparable; (b) the contribution of scale 1of the SPT is higher than that of scale 2 and scale 3. However, as a single subband, the residual highest frequency subband has the highest contribution. This is an indication that forgery traces can be better found in higher frequencies and in scale 1; (c) the well-established LBP method can be efficiently used in image forgery detection.

For future work, we plan to work on the problem of localizing the forged region. A comparison of different color spaces in image forgery detection can also be investigated in future work.