1 Introduction

With the fast development of computer technology and the popularity of software for image processing, image forgery which greatly reduces the credibility of the digital images, is becoming much easier to be achieved. Therefore, the image forgery detection has been becoming more and more attractive in recent years. The copy-move forgery is to paste a region / regions of an image into another part(s) of the same image. During the copy and move operation, some image processing methods such as rotation, scaling, blurring, and compression are always applied to ensure the imperceptibility of the copied region(s); however, they will increase the difficulties of forgery detection at the same time. On the other hand, since the copy and move operations are executed in the same image, which means the noise component, color characters and other important properties of the pasted region(s) are compatible with which of the rest of the image. Therefore, some of the forgery detection methods based on the related image properties are not applicable in this case. In past years, lots of methods have been proposed for the copy-move forgery detection, of which two main categories of features are usually employed: the block based features and the keypoint based features.

The block based algorithms [57, 11, 12, 14, 15, 17, 19, 20, 22, 24, 27, 28] are usually to divide the host images into blocks, extract the block features, and find the tampered regions from the matched blocks which have similar block features. Lots of block features have been proposed and employed in the area of forgery detection. Popescu and Farid [22] applied the Principal Component Analysis (PCA) method to reduce the feature dimensions. Luo et al. [19] used the RGB color components and direction information as block features. Li et al. [15] used Discrete Wavelet Transform (DWT) and Singular Values Decomposition (SVD) to extract block features. Mahdian and Saic [20] calculated the 24 Blur-invariant moments as block features. Kang and Wei [14] calculated the singular values of a reduced-rank approximation in each block. Bayram et al. [5] used the Fourier-Mellin Transform (FMT) to obtain features. Wang et al. used the mean intensities of circles with different radii around the block center as block features in [27] and [28]. Lin et al. [17] used the gray average results of each block and its sub blocks as the block features. Ryu et al. [24] used Zernike moments as block features. Fridrich et al. [12] calculated the Discrete Cosine Transform (DCT) coefficients as block feature. Bravo-Solorio and Nandi [7] calculated the information entropy as block features. Bi et al. [6] used the Polar Complex Exponential Transform moments as block features. Although the block based algorithms are effective in forgery detection, they have two main drawbacks: 1) the host images are usually divided into overlapping blocks of regular shape, therefore the computational complexity of block matching will accordingly increase with image size; 2) most of the existing block based algorithms cannot deal with significant geometrical transforms very well.

On the other hand, the keypoint based algorithms [2, 3, 8, 13, 21, 25, 26, 2931] are to extract an appropriate selection of keypoint features that can guarantee the robustness against geometrical transforms, and to match the keypoints to each other, to locate the tampered regions. In [2, 810, 13, 21], the Scale Invariant Feature Transform (SIFT) [18] was used to extract keypoint feature. In [26, 29], the Speeded Up Robust Features (SURF) [4] was used instead of SIFT as keypoint features. In [25] DAISY have also been considered for feature extraction for CMFD methods. Since the number of keypoints is much less than the number of overlapping blocks, the computational complexity is comparatively less. However, most of methods in this category could not achieve high precision rate while sustain high recall rate [9]. Recently, several new copy-move forgery detection schemes that employ both block features and keypoint features have been proposed. Li et al. [16] first segmented the image into semantically independent patches/blocks prior to keypoints extraction, then the matching between the patches/blocks are found to locate the copy-move regions. In this case, the keypoints extracted with SIFT can be regarded as block features since being extracted from patches/blocks. Similarly, we proposed an adaptive over-segmentation and feature points matching (ASFPM) method [23] for image forgery detection, which integrates the characteristics of both block features and keypoint features. The ASFPM method was proved to be superior to many state-of-the-art existing methods. In ASFPM, Discrete Wavelet Transform (DWT) was employed to analyze the frequency distribution of the host image, to adaptively determine the initial size of superpixel, however, when the host image consists of complex and mixed content, for example, smoothed textures are mixed with detailed textures, the frequency distribution calculated with DWT cannot determine the best initial size of superpixel, which will cause bad segmentation and bring inaccurate detection results. In addition, when the host image contains copied regions of different sizes, the ASFPM method will probably no longer work well.

In this paper, considering the weakness of the ASFPM method [23] as we mentioned, we propose a novel multi-scale feature extraction and adaptive matching method to detect the image copy-move forgery. In the proposed scheme, we integrate the characteristics of both block features and keypoint features to achieve better detection results, like [16, 23]. First, we segment the host image into non-overlapping patches of irregular shape in different scales; then, we apply SIFT to extract feature points from all patches, to generate the multi-scale features. An Adaptive Patch Matching algorithm is subsequently proposed for finding the matching that can indicate the suspicious regions in each scale. Finally, the suspicious regions in all scales are merged to determine the detected forgery regions. In the next section 2, we will give the framework of the proposed copy-move forgery detection scheme and explain each step in detail. In section 3, a lot of experiments are conducted to demonstrate the effectiveness of the proposed scheme. Finally, the conclusions are drawn in section 4.

2 The proposed copy-move forgery detection scheme

The proposed scheme using multi-scale feature extraction and matching integrates the characteristics of both block features and keypoint features and performs very well when there are multiple copy-move objects/regions and especially when the objects/regions are of different sizes and contain both smoothed and detailed textures. Figure 1 demonstrates the advantage of the proposed multi-scale based method, compared with the ASFPM [23] which we have proposed in our previous work. The compared results are estimated by precision and recall rate: precision is defined as the ratio of number of correctly detected forged pixels to the number of totally detected forged pixels; recall is defined as the ratio of number of correctly detected forged pixels to the number of forged pixels in the ground-truth forged image. In the first row of Fig. 1, (a) shows the forgery image ( image size:3888 × 2592), where three objects of different sizes and different textures are copied and pasted; (b) shows its corresponding ground truth image; (c) shows the detected results of the ASFPM method [23], from which the detection accuracy is calculated as: precision=84.65%, recall=70.54%; and (d) shows the detected results of the proposed multi-scale based scheme, from which the detection accuracy is calculated as: precision=97.96%, recall=85.72%. In this case, it is obvious and perceptual that the proposed scheme performs much better than the ASFPM method [23]. Similarly, in the second row of Fig. 1, another example (image size:2613 × 3900) is demonstrated. Two objects of different sizes are copied and pasted in (e), (e) and (f) respectively show the forgery image and ground truth image; (g) shows the detected results of the ASFPM method [23], with detection accuracy as: precision=96.90%, recall=78.90%; and (h) shows the detected results of the proposed scheme, with detection accuracy as: precision=96.21%, recall=98.67%. In this case, the precision rates of the two methods are similar, however the proposed scheme outperforms the ASFPM method [23] a lot in respect of recall rate.

Fig. 1
figure 1

Demonstration of detection results of ASFPM method and proposed method. 1st column: the forgery images; 2nd column: ground truth images; 3rd column: the detected regions of ASFPM method; and 4th column: the detected regions of the proposed scheme

Figure 2 shows the framework of the proposed scheme. First, the Multi-Scale Feature Extraction (MSFE) algorithm is proposed and applied to the host image, to generate the Multi-Scale Features (MSF). Then, the Adaptive Patch Matching (APM) algorithm is proposed and applied to the MSF, to obtain the matched patch pairs; from which the Matched Keypoints (MK) are obtained. Finally, according to MK, the Matched Keypoints Merging (MKM) algorithm generates the detected forgery regions. In the remainder of this section, sections 2.1, 2.2 and 2.3 will explain the three proposed algorithms: MSFE, APM, and MKM, respectively, in details.

Fig. 2
figure 2

The framework of the proposed copy-move forgery detection scheme

2.1 Multi-scale feature extraction algorithm

In the existing block based algorithms [5, 7, 14, 15, 17, 19, 20, 22, 24, 27, 28], the host image is divided into the overlapping regular blocks (e.g. block size =16 × 16 in Fig. 3-(a)); then, the forgery regions can be found from the matched blocks. Since, the forgery regions are not of regular shapes (e.g. in Fig. 4-(a)), the detected forgery regions represented by the set of regular blocks are usually inaccurate. In addition, when the size of host image increases, the computational time of the block matching will be much more expensive. To address the above-mentioned problems, we propose to segment the host image into non-overlapping irregular patches (e.g. in Fig. 3-(b)), and then to find the forgery regions by matching the non-overlapping and irregular patches.

Fig. 3
figure 3

Different blocking/segmentation methods. a Overlapping and regular blocking; (b) Non-overlapping and irregular segmentation

Fig. 4
figure 4

Three-scale segmentation demonstration. a The forgery image; (b, c) and (d) The segmentation results in three different scales and the corresponding matched patches and the matched keypoints selected from the matched patches

Considering that superpixel algorithms can group pixels into perceptually meaningful atomic regions, in our algorithm, we employ the Simple Linear Iterative Clustering (SLIC) algorithm [1] to segment the host image. SLIC algorithm adapts a k-means clustering approach to efficiently generate superpixels, and superpixel adheres to boundaries very well. With SLIC, the host image is segmented into the non-overlapping superpixels that are meaningful and are of irregular shapes. The non-overlapping segmentation method can help to decrease the computational expenses, compared with the existing overlapping block method; in addition, in most of the cases, the irregular and meaningful regions can represent the forgery regions better than the regular blocks. Figure 3 shows the different blocking/segmentation methods, where (a) shows the overlapping and regular blocking method of the existing forgery detection algorithms and (b) shows the non-overlapping and irregular segmentation method of the proposed scheme.

Most of the existing block based algorithms divided the host images only in single scale with initially predefined block size; in that situation, if block size is too small, some forgery regions will be missed, as shown in Fig. 4-(d); otherwise, if the block size is too large, some error detected results will be introduced, as shown in Fig. 4-(b). To solve this problem, we propose the multi-scale segmentation in our algorithm. Figure 4 shows the three-scale segmentation and the corresponding segmentation result in each scale. In Fig. 4, (a) shows the forgery image, where the region in sky-blue represents the original region, and the region in blue represents the target copy-move region; (b), (c) and (d) respectively show the segmentation results in three different scales. The patches highlighted in yellow indicate the matched patches, while the points highlighted in blue indicate the matched keypoints.

As discussed above, the proposed MSFE algorithm segments the host image into the patches with multiple scales; from each patch the feature points are then extracted. In recent years, the feature points extraction methods such as SIFT [18] and SURF [4] are widely used in the field of computer vision. SIFT and SURF were proved to be robust against the common image processing operations such as rotation, scale, blurring, and compression; in consequence, SIFT and SURF were also used as feature points extraction methods in most of the existing keypoint based forgery detection algorithms. Christlein et al. [9] compared performances of SIFT and SURF with the other 13 image feature extraction methods in the comparative experiments, and the results indicate that the SIFT and SURF based methods perform better than the others. Therefore, in this scheme, we choose SIFT with default parameters as the feature extraction method to extract feature points as patch features.

Figure 5 shows the flowchart of the proposed MSFE algorithm. First, the host image is blocked into the patches with the superpixel segmentation method; then, the feature points are extracted from these patches. The whole process is repeated along with the decreasing of the size of segmentation, until feature points cannot be extracted any more in the corresponding scale. Finally, the multi-scale feature MSF is generated, which includes the patches in each scale and the corresponding feature points. The steps of the proposed MSFE algorithm are explained in Algorithm I as follows.

figure d
Fig. 5
figure 5

Flowchart of the Multi-Scale Feature Extraction (MSFE) algorithm

In STEP-1 of Algorithm I, the appreciate initialization of B can avoid segmenting the host image into excessive scales. In the experiments, by experiments, the B is initially set as 200 when the size of host image M × N is larger than1500×1500; otherwise, the B is initially set as 100.

2.2 Adaptive patch matching algorithm

After generatingMSF, we need to locate the matched patch pairs in each scale. In most of the existing block based algorithms, the block matching generates specific block pairs only if there are many other matched pairs in the same mutual position, assuming they have the same shift vector. When the number of matched block pairs, which have same shift vector, exceeds a user-specified threshold, the matched block pairs that contribute to that specific shift vector will be identified as the regions that probably have been tampered. In that situation, the threshold is related to the regions that can be identified; the larger threshold may cause some not-so-closely matched blocks missing, while the smaller threshold may bring more false matched blocks. Therefore, the threshold highly relates with the performance of the forgery detection algorithms, and how to determine the just right threshold becomes an important issue.

An Adaptive Patch Matching (APM) algorithm which aims at improving the existing matching process is proposed by adaptively determining the threshold. Figure 6 shows the flowchart of the APM algorithm. In the i th scale (i ∈ {1,  ⋯ , n}), the number of matched keypoints of each patch pair is calculated according to PF i = {P i, F i} and the correlation coefficient map CC i is generated; then the corresponding patch matching threshold TP i is determined adaptively; the matched patch pairs MP i are located by TP i; and finally the matched keypoints MK i are selected from MP i. The steps of the proposed APM algorithm are explained in Algorithm II as follows.

figure e
Fig. 6
figure 6

Flowchart of the Adaptive Patch Matching (APM) algorithm

In STEP-2 of Algorithm II, the keypoints are matched using the best-bin-first algorithm with their Euclidian distance; which means that a keypoint f a is matched to the keypoint f b only if they can meet the following condition:

$$ d\left({f}_a,{f}_b\right)\cdot TK\le d\left({f}_a,{f}_i\right) $$
(1)

Where d(f a , f b ) means the Euclidian distance between the keypoints f a and f b , and it is defined in (2); d(f a , f i ) means the Euclidian distances between the keypoints f a and all other keypoints, and it is defined in (3). TKis the keypoints matching threshold; when TK becomes larger, the matching accuracy will be higher, but meanwhile the ratio outliers will be higher accordingly, which will cause greater miss probability. Therefore, in the experiments, we set TK = 2 by experiments to provide a good trade-off between matching accuracy and miss probability.

$$ d\left({f}_a,{f}_b\right)=\sqrt{{\left({x}_a-{x}_b\right)}^2+{\left({y}_a-{y}_b\right)}^2} $$
(2)
$$ d\left({f}_a,{f}_i\right)=\sqrt{{\left({x}_a-{x}_i\right)}^2+{\left({y}_a-{y}_i\right)}^2},i=1,2,\ldots n;i\ne a,i\ne b $$
(3)

Correlation coefficient means the number of matched keypoints between the two patches. Assuming there are B i patches in the i th scale, we can generate t = B i (B i  − 1)/2 correlation coefficients, which form the correlation coefficient mapCC i. After generatingCC = {CC 1, CC 2,  … , CC n}, we need to calculate the patches matching threshold TP as stated in STEP-3 of Algorithm II. The procedures of the adaptive calculation of the patch matching threshold TP in each scale are explained as follows in Algorithm III.

figure f
$$ {\nabla}^2\left(CC\_{F}_j^i\right)>\overline{\nabla \left(CC\_{F}^i\right)} $$
(4)

Figure 7 shows the demonstration of patch matching threshold calculation. In Fig. 7, (a1) shows the forgery image; (a2) shows the plot of the sorted and filtered correlation coefficients of the i th scale of forgery image, which are calculated with STEP-1 of Algorithm III; and (a3) shows the plots of the first derivative, the second derivative, and the mean of first derivative of the i th scale of forgery image, which are calculated from (a2) with STEP-2 of Algorithm III. According to STEP-3 and STEP-4 of Algorithm III, the corresponding threshold is calculated and located in (a3), represented by the red point. Meanwhile, (b1) shows the original image without forgery; (b2) shows the corresponding plot of the sorted and filtered correlation coefficients of the i th scale; and (b3) shows the corresponding plots of the first derivative, the second derivative, and the mean of first derivative of the i th scale, which are calculated from (b2). It can be easily seen that we cannot get the matching threshold from the non-forgery image, when using the proposed Adaptive Matching Threshold Calculation algorithm. After calculating the patch matching threshold of each scale adaptively, we can locate the matched patch pairs in each scale if their correlation coefficients are larger than the corresponding matching threshold. From those matched patches in each scale, we selected the matched keypoints to form the Matched Keypoints (MK).

Fig. 7
figure 7

Demonstration of patch matching threshold calculation. (a1) Forgery image; (a2) Plot of the correlation coefficients after sorting and filtering, of the forgery image, in the i th scale; (a3) Plots of the first derivative, the second derivative, and the mean of first derivative, calculated from (a2); (b1) Original image; (b2) Plot of the correlation coefficients after sorting and filtering, of the original image, in the i th scale; (b3) Plots of the first derivative, the second derivative, and the mean of first derivative, calculated from (b2)

2.3 Matched keypoints merging algorithm

After obtaining the matched keypoints MK, we need to determine the forgery regions by turning the independent pixels/keypoints into regions. Figure 8 shows the flowchart of the MKM algorithm. First, the host image is segmented into small superpixels; then, MK are replaced by the small superpixels to form the suspected forgery regions. The size of small superpixels is related with the size of the host image; when the host image is of higher resolution, the size of small superpixels will be larger. In our test dataset, the average size is approximate3000 × 2000, therefore, we set the size of small superpixel as 20 by experiments. Next, the suspected forgery regions in all scales are merged. If the suspected forgery regions are merged together by using ‘OR’ operation, the miss rate of the forgery detection will be reduced, however, the probability of error detection will be bigger, Fig. 9-(c) and (d) demonstrate the results of this case. Therefore, we need to filter out some regions which may be wrongly detected during the merging process. In all scales, we count the pixel appearing times as T = {t min, t min + 1,  … , t max} , where t max is the maximum value of the pixel appearing times in all scales, t max ≤ n. Because the host images are different, the T is considered as a random sequence, and the probability of random variable is regarded as the normal distribution. Consequently, the mean μ and the standard deviationσ of random sequence Tcan be respectively calculated by using (5) and (6).

$$ \mu =\frac{1}{ \max - \min}\sum_{i= \min}^{\max }{t}_i $$
(5)
$$ \sigma =\sqrt{\frac{1}{ \max - \min}\sum_{i= \min}^{\max }{\left({t}_i-\mu \right)}^2} $$
(6)
Fig. 8
figure 8

Flowchart of the Matched Keypoints Merging (MKM) algorithm

Fig. 9
figure 9

The demonstration of the suspected forgery regions merging results. a, g The forgery images; (b, h) The copy-move forgery regions; (c, i), (d, j), (e, k) The suspected forgery regions in three different scales; and (f, l) The merged regions

Considering the property of normal distribution that the area of μ − 2σ range in the normal distribution curve is 95%, we choose μ − 2σ as the merging threshold to filter out the wrongly detected pixels. When the appearing time of the pixel is smaller than the merging threshold μ − 2σ, the corresponding pixels will be discarded. Therefore, the suspected forgery regions in all scales are merged using (7), and at the same time, some of the wrongly detected regions can be discarded. Figure 9 shows the demonstration of merging results of the suspected forgery regions.

In Fig. 9, (c)(i), (d)(j) and (e)(k) display the suspected forgery regions in three scales, which indicate that the proposed multi-scale method will bring some wrong regions; while (f)(l) display the detected regions after the merging process, where the inaccurately detected regions have been successfully removed.

$$ g\left(x,y\right)=\left\{\begin{array}{l}1\begin{array}{cc}\hfill \hfill & \hfill \mu -2\sigma \le \sum_{i=1}^n{f}_i\left(x,y\right)\le {t}_{\max}\hfill \end{array}\\ {}0\begin{array}{cc}\hfill \hfill & \hfill 0\le \sum_{i=1}^n{f}_i\left(x,y\right)<\mu -2\sigma \hfill \end{array}\end{array}\right. $$
(7)

Where g(x, y) are the merged regions; f i (x, y) represents the suspected forgery regions in the i th scale; n is the number of scales; and μ − 2σ is the threshold that is used to filter out the inaccurately detected pixels.

In the last step of MKM algorithm, we apply the close morphology operation to generate the final regions. The structural element we use in the close operation is defined as a circle whose radius is related to the host image size. The close operation fills the gap in the merged regions, while keeps the shape of the regions.

3 Experiments and discussions

In this section, the experiments are conducted to evaluate the effectiveness and robustness of the proposed copy-move forgery detection scheme. In the following experiments, the benchmark database[9] which consists the realistic copy-move forgeries is used to test the proposed scheme. Figure 10-(a1) ~(e1) shows a selection of images from the database. The dataset comprises 48 uncompressed PNG true color images. The average size of forgery regions is about 6% of each image. These images have a size of 3000×2300 pixels. The copied regions are of categories of living, nature, man-made and even mixed, and they range from smooth to highly texture; the copy-move forgeries are created by copying, scaling and rotating semantically meaningful image regions. JPEG compression and down-sampling are also added on the forgery images; in addition, the combined transformations and multiple copies forgeries are included in the image dataset. Therefore, we choose this dataset to objectively evaluate our scheme. Figure 10 shows the copy-move forgery detection results of the proposed scheme. In Fig. 10, the figures in the first row are the forgery images selected from the dataset; the second row displays the ground truth; and the third row displays the detected forgery regions.

Fig. 10
figure 10

The copy-move forgery detection results of the proposed scheme. The 1st row: the five selected images in the dataset; 2nd row: ground truth images; 3rd row: The detected forged regions

In order to evaluate the performance of the proposed scheme, the two characteristics precision and recall [9] are calculated using (8) and (9) respectively. We also give the F score [9], which is defined in (10), as a measure which combines the precision and recall in a single value.

$$ precision=\frac{\left|\Omega \cap {\Omega}^{\hbox{'}}\right|}{\left|\Omega \right|} $$
(8)
$$ recall=\frac{\left|\Omega \cap {\Omega}^{\hbox{'}}\right|}{\left|{\Omega}^{\hbox{'}}\right|} $$
(9)

Where Ω means the set of forgery regions detected by the proposed scheme for the dataset; and Ω' means the set of all forgery regions for the dataset.

$$ F=2\times \frac{precision\times recall}{precison+ recall} $$
(10)

To reduce the effect of random samples, the average precision/recall is computed over all the images in the dataset. Since Christlein et al. [9] have particularly recommended all benchmark methods, we use the dataset they provided and compare our experimental results with several state-of -the-art algorithms: the SIFT based detection method [9], which combined the methods of [2, 21]; the SURF based detection method [9]; Zernike moments based forgery detection method [24]; the method proposed by Bravo [8]; the SBFD method proposed in [16]; and the ASFPM method [23] which we have proposed in our previous work. We mainly compare the performances of our scheme with the state-of -the-art algorithms under different scenarios: the plain copy-move forgery; the forgery with distortion by various attacks including: scaling, rotation, Gaussian noise addition JPEG compression, and even combined attacks; the multiple copies forgery and the down-sampling forgery. The following sections 3.1, 3.2, and 3.3 demonstrate the detection results.

3.1 Detection results under plain copy-move forgery

Basically, we firstly evaluate the proposed scheme when under the ideal condition, that is the plain copy-move forgery. We have 48 original images and 48 forgery images, where one to one copy-move forgery is implemented. The detection methods distinguish the original images from the forgery images in this case. We evaluate the scheme at both pixel level and image level, and Tables 1 and 2 show the detection results of the 96 images at the image level and the pixel level, respectively. While pixel-level metrics are useful to assess the general localization performance of algorithm when the ground-truth data is available, and at pixel level, the precision and recall rates are calculated by counting the number of pixels in the corresponding regions; the image level decisions are particular interest to the automated detection of manipulated images, and at image level, precision is the probability that a detected forgery is truly a forgery, while recall is the probability that a forgery image is detected. In general, higher precision as well as higher recall indicates the superior performance. In Tables 1 and 2, the results in bold indicate the results of the proposed scheme and the results in bold and italic indicate the best ones. It can be easily seen that our scheme can achieve 90.57% precision and meanwhile 100% recall, which performs better than the most of existing state-of-the-art methods at image level, except the Zernike moments based method [24] which can achieve precision up to 92.31% and recall up to 100%. Meanwhile, the advantage of the proposed multi-scale detection method is particularly prominent at pixel level, when comparing with the existing state-of-the-art methods, as indicated in Table 2. The proposed method achieves precision up to 95.22% and recall up to 90.6%, which is much better than the existing state-of-the-art methods. The results indicate the good accuracy of our proposed copy-move forgery detection scheme by using multi-scale feature extraction and matching.

Table 1 Detection results of the plain copy-move forgery at the image level
Table 2 Detection results of the plain copy-move forgery at the pixel level

3.2 Detection results under various attacks

Besides the one to one plain copy-move forgery, we also test our proposed scheme when the copied regions are attacked by various attacks including geometric distortions, image degradations, and even combined attacks. That means, the forgery images are generated by using each of the 48 images in the dataset, and the copied regions are attacked by attacks as follows:

  1. 1)

    Scaling

The copied regions are scaled with the scale factor varies from 91% to 109%, with the step as 2%, as shown in the 1st row of Fig. 11. In this case, we need to test totally 48 × 10 = 480 images.

  1. 2)

    Rotation

Fig. 11
figure 11

Detection results under various attacks. (a1) ~(a3) Scaling; (b1) ~(b3) Rotation; (c1) ~(c3) Gaussian Noise addition; (d1) ~(d3) JPEG Compression; and (e1) ~(e3) Combined transforms. The first column displays the precision, the second column displays the recall and the third column displays the F score

The copied regions are rotated with the rotation angle varies from 2° to 10°, in step of 2°, as shown in the 2nd row of Fig. 11. In this case, we need to test totally 48 × 5 = 240 images.

  1. 3)

    Gaussian Noise Addition

The image intensities are normalized between 0 and 1 and added zero-mean Gaussian noise with standard deviations of 0.02, 0.04, 0.06, 0.08 and 0.10 to the inserted snippets before splicing, as shown in the 3rd row of Fig. 11. In this case, we need to test totally 48 × 5 = 240 images.

  1. 4)

    JPEG compression

The JPEG compression is applied to the forgery images and original images, with the qualify factor varies from 100 to 20, with the step as -10, as shown in the 4th row of Fig. 11. In this case, we need to test totally 48 × 9 = 432 images.

  1. 5)

    Combined transforms

Six combined transforms are applied into the copied regions to evaluate the proposed scheme, as shown in the 5th row of Fig. 11. In Fig. 11-(e1)~(e3), ‘cmb1’ indicates the combined attack includes scaling with factor as 101%, rotation with angle as 2°, JPEG compression with quality factor as 80; ‘cmb2’ indicates the combined attack includes scaling with factor as 103%, rotation with angle as 4°, JPEG compression with quality factor as 75; ‘cmb3’ indicates the combined attack includes scaling with factor as 105%, rotation with angle as 6°, JPEG compression with quality factor as 70; ‘cmb4’ indicates the combined attack includes scaling with factor as 107%, rotation with angle as 8°, JPEG compression with quality factor as 65; ‘cmb5’ indicates the combined attack includes scaling with factor as 120%, rotation with angle as 20°, JPEG compression with quality factor as 60; and ‘cmb6’ indicates the combined attack includes scaling with factor as 140%, rotation with angle as 60°, JPEG compression with quality factor as 50. In this case, we use totally 48 × 6 = 288 images.

The detection results under various attacks are displayed in Fig. 11, where the results indicated in blue show the results of the proposed scheme and the results of three columns indicate the precision rate, recall rate and F score, respectively. In Fig. 11, (a1) ~(a3) show the results under the scaling, where the x-axis indicates the scale factor; (b1) ~(b3) show the results under the rotation, where the x-axis indicates the rotation angle; (c1) ~(c3) show the results under the Gaussian Noise addition, where the x-axis indicates the standard deviations; (d1) ~(d3) show the results under the JPEG compression, where the x-axis represents the quality factor; and (e1) ~(e3) show the results under the combined transforms. Furthermore, we compare the proposed scheme with the existing state-of-the-art methods: the SIFT based detection method [9], indicated in green; the SURF based detection method [9], indicated in red; the Zernike moments based forgery detection method [24], indicated in rose-red; the method proposed by Bravo [8], indicated in blue-sky; the SBFD method proposed in [16], indicated in black; and the ASFPM method [23] which we have proposed before, indicated in yellow. The results of the methods are indicated in lines of different colors as displayed in Fig. 11. It can be seen from the first and second rows, all the recall, precision, and F score of the proposed scheme are greater than 90%, which indicates that the proposed scheme performs much better than the existing state-of-the-art forgery detection methods under the geometric transforms. As well, the proposed scheme performs well under the common signal processing such as Gaussian Noise addition and JPEG compression, as shown in the third and fourth rows. Note that, although our recall rates are worse than which of the SBFD method, the F scores are better than it under the Gaussian Noise addition and the JPEG compression. In Fig. 11-(e1) ~(e3), the proposed scheme is evaluated under six combined attacks we defined. It is obviously that the proposed scheme performs much better than the other methods.

3.3 Detection results under multiple copies and down-sampling

Besides the plain copy-move forgery and the forgeries attacked by various attacks, we also evaluate the proposed scheme when the forgery images have multiple copies. In order to test the multiple copies forgery, we have copied an 64 × 64 image region five times and moved them to the random locations in the image itself. Table 3 shows the comparison of the detection results in this scenario. It can be easily seen that the proposed scheme outperforms the most of existing detection methods except the method proposed by Bravo [8] which can achieve precision up to 88.75%, however, our scheme can achieve much higher recall. The results indicate the good performance of the proposed multi-scale feature extraction and the adaptive matching for copy-move forgery detection.

Table 3 Detection results under the multiple copies forgery

Considering that the performance of forgery detection algorithms usually matters with the quality of the resources, we evaluate the proposed scheme and compare it with the mentioned state-of-the-art methods under the down-sampling, as shown in Fig. 12, where (a), (b) and (c) display the precision, recall and F score, respectively. We scale down all the images in the plain copy-move forgery in step of 20%. Note that the parameters of detection methods are globally fixed to avoid over-fitting. In Fig. 12, the x-axis means the down-sampling factor and the results in blue indicate which of the proposed scheme while the results in other colors indicate which of the above-mentioned state-of-the-art methods. The proposed scheme performs much better than the existing methods in this case.

Fig. 12
figure 12

Detection results under the down-sampling. a precision; (b) recall; (c) F score

4 Conclusion

With the help of the digital processing programs, images can be easily manipulated to create non-existing situations, which diminishes the credibility and value of images presented as evidence in the court or media. As one can expect, the situation will get worse, when the tools that perform the forgeries will move from research labs to commercial software. So, we must explore a method to provide a decisive answer whether an image contains image forgery. In this paper, we propose a novel multi-scale feature extraction and adaptive matching method to detect the copy-move image forgery. In the proposed scheme, first, we segment the host image by SLIC in multi-scale, to generate multi-scale patches; then we apply SIFT to patches in all the scales, to extract feature points. Next, the Adaptive Patch Matching algorithm is subsequently proposed for finding the matching which can indicate the suspicious forged regions in each scale. And finally, the suspicious regions in all scales are merged and some morphological operations are applied to generate the detected forgery regions. In general, we have four main contributions in the proposed scheme: 1) we replace the overlapping blocks of regular shape in traditional forgery detection algorithms, with individual irregular patches, which can better partition the host images into non-overlapping blocks. 2) We segment the host image into patches in multiple scales, from which the feature points are extracted respectively. The proposed multi-scale feature extraction method can extract more accurate feature points. 3) Instead of artificially setting the patch matching threshold in advance, we propose to adaptively calculate the matching threshold for better feature recognition. And 4) during the post-processing, we propose to use the predefined small superpixels to replace the matched keypoints and we apply some morphology operations into the merged regions to generate more accurately detected forgery regions.

Experimental results show that the proposed scheme performs much better than the existing state-of-the-art copy-move forgery detection algorithms, even under various challenging conditions including: the geometric transforms, such as scaling and rotation; and the common signal processing, such as JPEG compression and noise addition. In addition, the special cases such as the multiple copies and the down-sampling are also evaluated and the results indicate the very good performance of the proposed scheme. We may focus on applying multi-scale approach to other kind of forgery such as splicing or other kind of media such as video and audio in the future work.