Keywords

1 Introduction

The readily accessible, easy-to-use, and potent digital image editing tools such as Photoshop have made it easy to manipulate and tamper with digital images without leaving any visible clues. As a result, there is a massive rise in digitally produced forgeries in mass media and on the Internet. This pattern suggests vulnerabilities issues and reduces the integrity of digital images. Developing techniques for checking the validity and authenticity of digital images has become very necessary, mainly since the images displayed are evidence in a courtroom, as news reports, as a financial document. In this context, image tamper identification has become one of the critical objectives of image forensics.

We focus here on a particular form of image manipulation where a part of the image is usually copied and pasted on to another section, typically to cover unwanted parts of the image, named as copy-move forgery. An example of copy-move forgery is shown in Fig. 1, where image (a) is the original image and shows three missiles, whereas image (b) is the forged image in which one missile is copy pasted at a different location on the image to show that there were four missiles launched instead of 3. From this example, it becomes clear that it is quite possible that forgeries may not leave any perceptual clues of tampering. Thus, it becomes quite challenging to identify such cases and ensure that the integrity of the image is intact. They may be crucial to applications at times.

Fig. 1.
figure 1

An example of copy-move forgery

To detect copy-move forgeries, many schemes have already been proposed in the literature. Some schemes propose solutions that are too computation intensive while others lack at accurate region localization for the forged portions and result in high false positive rate (FPR). FPR values for various copy move forgery detection (CMFD) schemes has been enlisted in Table 1.

Table 1. Approximate FPR of various CMFD techniques

In this paper, we propose a copy-move forgery detection algorithm for images. The baseline idea is to utilize the statistical image properties, specifically, mean and variance to detect the duplicate regions. The image is partitioned into blocks and comparison based on the block properties is done to categorize it as tampered or authentic region. We achieve a detection accuracy (F-score) of 97.05% with FPR as low as 0.051%. Rest of the paper is organized as follows: The related work is discussed in Sect. 2. Detailed steps of the proposed Statistical Copy Move Forgery Detection (SCMFD) algorithm are presented in Sect. 3. In Sect. 4, we present the results and the analysis and Sect. 5 concludes the work along with the future directions.

2 Related Work

The area of copy-move forgery is well researched and many methods have already been proposed to detect copy-move forgery. One of the most straightforward and obvious technique is comparing each pixel of an image with other pixels to detect manipulation [2]. Though the idea seems pretty simple, but it has a lot of computational complexity. The computation would be of order \(O(P^2)\), where P is the total number of pixels in the image. The number of computations can be reduced by lexicographically sorting the pixels according to their values and only comparing the values in the near vicinity to find copy-move pixels [1]. However, this method has its shortfalls even after optimizing the computations. This method can be tricked by slightly changing the values of the pixel or rotating it during copy-move forgery. And perpetrators often change this value by color corrections and smoothing. This often results in disconnected pixels being detected as shown in Fig. 2. This method is also not robust against JPEG compression [2].

Fig. 2.
figure 2

Shortfalls of a simple block based copy-move forgery detection technique [2]

A simple block based approach to detect copy-move forgery would be to compare the mean and standard deviation of blocks [3]. However, this approach alone is not resilient to images where background looks similar or have similar pixels properties. This background creates a high number of false positives, which increases the false positive rate (FPR) and decreases the accuracy. Another standard method for copy-move forgery detection is auto-correlation. Most of the ‘information’ in an image is stored in the low-frequency range, so we cannot directly apply auto-correlation on the image; otherwise, we will have spikes on the edges [2]. We first pass it through a high pass filter, which will remove all the high frequency from the image. Then we compute the auto-correlation of the image to detect copy-move forgery. This method is not computationally intensive, but it has a hard time detecting copy-move forgeries, which are relatively smaller to the size of the image [2].

A popular method for copy-move detection is using Discrete Cosine Transform (DCT) [1]. The image is divided into a number of consecutive blocks usually 8 \(\times \) 8, then DCT is applied on that block, and low-frequency data is extracted using zig-zag traversal. Then this block is sorted lexicographically to find similar blocks within a user-defined threshold [1]. Another similar method for copy-move forgery detection is using Principal component analysis (PCA) instead of DCT [13]. PCA is used to represent higher dimension data into lower dimensions, and in this case, PCA will extract data from the blocks and then compare that said data to find similar blocks that are copy-moved. DCT is a better approach in comparison to PCA to find copy-move forgery [1].

Similarly, many more approaches have been proposed in the field of copy-move forgery detection. In [4], the authors proposed a sorted neighborhood technique based on a discrete wavelet transform (DWT). Then Singular Value Decomposition (SVD) is applied on the image’s low frequency information, this method is robust against JPEG compression. In [5], the authors proposed an approach based on Fourier-Mellin Transform (FMT) along with bloom filters for CMFD. This approach is also resilient to post processing techniques like Gaussian Noise and blur. In [6], the authors proposed an approach based on DCT and singular value decomposition (SVD). Although the approach is not robust against rotation but it gives good results in case of noise, blurring, and compression.

In [7], the authors proposed an approach based circular block extraction and Local Binary Patterns (LBP). This approach is robust against compression, rotation, blurring and noise. In [8], the authors used an approach based on circular blocks and Polar Harmonic Transform (PHT). In [9], the authors proposed a technique based on Dyadic Wavelet Transform (DyWT). In [10], an approach based on Histogram of Oriented Gradients (HOG) is used to detect the copy-move forge regions. High false positive rate (FPR) was the bottleneck for most of the approaches which the proposed approach is successful at drastically decreasing as tabulated in Table 1.

3 The Proposed Statistical Copy Move Forgery Detection (SCMFD) Approach

In this section, we present the proposed Statistical Copy Move Forgery Detection (SCMFD) approach. It aims to accurately localize the forged portions within an image exploiting it’s statistical properties. An overview of the proposed SCMFD approach is shown in Fig. 3. Some optimizations are proposed to improve the overall accuracy of the final prediction mask.

Fig. 3.
figure 3

An overview of proposed SCMFD approach

3.1 SCMFD

Here, we present a region duplication detection method for images. It utilizes the statistical image features viz., mean, and variance to detect forgery on a block-basis. The image is first converted to a gray-scale (if input image is a color image) to reduce computational cost and further analysis for forgery detection is done with just one channel. A filter of a particular block size is slid across the image and then mean and standard deviation (SD) is calculated for all the pixel in it. Each of these values are appended to a matrix \(data\_array\) along with the coordinates of the top left pixel of that image block. Thus, this matrix has four columns as mean, SD, x and y co-ordinates respectively.The matrix is sorted row-wise based on the mean values to arrange in order (either descending or ascending). Traverse the resulting matrix row-wise and for every element in sorted matrix \(sorted\_array\), generate a \(Sub\_array\) with \(check\_offset\) as number of neighbors on both of its side. Compute absolute mean difference, absolute SD difference and euclidean distance between the element and its neighbors (elements of \(Sub\_array\)).

figure a

If absolute mean difference < \(Mean\_Ths\), and absolute SD difference < \(SD\_Ths\), and euclidean distance > \(Dist\_Ths\), SD of element > \(SD\_block\) then append element into a new array called \(Sim\_array\). The prediction mask PM is then created by highlighting blocks identified as similar.

3.2 Optimizations in SCMFD

We can reduce the false positives by comparing both mean value and SD of the blocks with a specified threshold. If the difference in mean value between two blocks is below this threshold \(Mean\_{Ths}\) and if the standard deviation of these blocks is the same, then these blocks will be identified as similar by SCMFD algorithm. Image blocks that are physically closer in the image may also lead to false positives. Therefore we would only consider those block pairs where the sum of the number of similar blocks identified at a particular distance meets the minimum user-defined threshold \(Dist\_{Ths}\). This way, we discard anomalies that are often next to each other or are often a part of the same object and not the corresponding copy-moved object. We also use another user-defined threshold \(SD\_{block}\) to ensure that block pairs whose distance from each other is below a threshold are discarded as shown in Fig. 4.

Fig. 4.
figure 4

PM using threshold \(SD\_{block}\) on SD of individual image blocks

The proposed SCMFD algorithm does provide accuracy of 76% with FPR of 1.38% that increases with increase in \(block\_{size}\) and decrease in \(SD\_block\) (\(Mean\_Ths\), \(SD\_Ths\) and \(Dist\_Ths\) kept as constant). It seems to be far less when compared to other statistical approaches for CMFD, but the big advantage of SCMFD is that it runs very fast as feature vector’s length is just 2 containing only the mean value and SD value. This makes similarity check a lot easier among the blocks. Therefore, no PCA is needed. Also opposed to other statistical approaches, increasing block size increases performance (note that if the block size is large, there is a slight decrease in number of total blocks which slightly increases performance). To further increase the accuracy of the PM, we prepare the final PM by overlaying \(PM's\) iteratively from the previous pass of the SCMFD algorithm varying the block sizes \(block\_size\) and \(SD\_block\) for every pass.

$$\begin{aligned} PM_{mul} = \frac{\sum _{0}^{N} PM_{i}}{N} \end{aligned}$$
(1)

where, \(PM_{i}\) represents the prediction mask at the \(i^{th}\) iteration and N as the total number of passes.

The final PM will have values ranging from 0 to 1. A simple filter is applied on this mask to change every value greater than a threshold to 1 and 0 otherwise ( we used 0.4 in our study) as shown in Fig. 5. The resulting prediction mask gives us 97% accuracy on an average with 0.05% FPR.

Fig. 5.
figure 5

Overlayed prediction mask

4 Experimental Results and Analysis

To evaluate the performance of the proposed approach, we have used Copy-Move Forgery Dataset [12]. Exhaustive set of experiments were conducted using the images from this standard dataset and overall F-score and FPR presented in the results.

4.1 Dataset

The Dataset [12] is made of medium sized images (almost all 1000 \(\times \) 700 or 700 \(\times \) 1000) and it is subdivided into several datasets (D0, D1, D2). We have used the subdivided dataset - D0 for our experiments which contains uncompressed images and their translated copies.

4.2 Results

The base algorithm SCMFD provides accuracy up to 76% with a FPR of 1.38%. The F-score and FPR are shown in Table 2 and Table 3 respectively, which shows the increase in accuracy and decrease in FPR as we increase the \(block\_size\) and \(SD\_block\). By increasing the \(SD\_block\), we see that the areas having repeated pattern or solid colors are reduced in the PM as shown in Fig. 6 moving from (a) to (e). And as we increase the \(block\_size\), we see that the false positives are reduced greatly that can be seen if we move from Fig: 6, 7, 8, 9, 10 and 11. The accuracy obtained by SCMFD are further optimize.

Table 2. Average values for F-score for different block sizes and respective \(SD\_block\)
Table 3. Average values for FPR for different block sizes and respective \(SD\_block\)
Fig. 6.
figure 6

Prediction mask for block_size = 5 with different \(SD\_block\)

Fig. 7.
figure 7

Prediction mask for block_size = 10 with different \(SD\_block\)

Fig. 8.
figure 8

Prediction mask for block_size = 15 with different \(SD\_block\)

Fig. 9.
figure 9

Prediction mask for block_size = 20 with different \(SD\_block\)

Fig. 10.
figure 10

Prediction mask for block_size = 25 with different \(SD\_block\)

Fig. 11.
figure 11

Prediction mask for block_size = 30 with different \(SD\_block\)

The results are further improved by doing multiple passes and overlaying different prediction mask from different passes of SCMFD with different block sizes and \(SD\_block\).

Table 4 shows how the accuracy is increased by overlapping prediction mask of different block-sizes and taking different standard deviations into consideration. By doing so, we get 5 different overlaying prediction masks which are shown in Fig. 12(a) prediction mask obtained by overlaying prediction mask of all block-sizes and all \(SD\_block\) values. Figure 12(b) prediction mask obtained by overlaying prediction mask of all block-sizes with \(SD\_block\) values = 1, 2, 3, 4. Figure 12(c) prediction mask obtained by overlaying prediction mask of all block-sizes with \(SD\_block\) values = 2, 3, 4. Figure 12(d) prediction mask obtained by overlaying prediction mask of all block-sizes with \(SD\_block\) values = 3, 4. Figure 12(e) prediction mask obtained by overlaying prediction mask of all block-sizes with \(SD\_block\) values = 4. From these prediction masks, we can see that the FPR is decreased and the F-score is increased.

Fig. 12.
figure 12

Prediction mask obtained after doing multiple passes of SCMFD

Table 4. F-score and FPR after optimization

4.3 Comparative Performance with State-of-the-art Approaches

The comparative performance of the proposed approach with the other state-of-the-art schemes in tabulated in the Tables 5 and Table 6 based on F-score and FPR respectively. High F-score of approximately 98.38% in case of DyWT_zernike [16] and Dixit et al. [3] is achieved which is marginally higher than that achieved by the proposed approach. However, these methods have approximate FPR of 8.08% and 2.55% which is much higher when compared to the proposed approach that attains FPR as low as 0.051%.

Table 5. Comparative performance based on F-score
Table 6. Comparative performance based on FPR

5 Conclusion

In this paper, we propose SCMFD approach that aims to accurately localize the forged portions within an image exploiting it’s statistical properties. It utilizes the statistical image features viz., mean, and variance to detect forgery on a block-basis. The accuracy is further improved by preparing new prediction mask by overlaying different prediction mask from different passes of SCMFD. As our future work, we would like to make the proposed method robust against various attacks of noise addition, scaling, jpeg compression, etc.