1 Introduction

With development of modern digital cameras and smartphones, the use of multimedia content is increasingly present in everyday life. Distribution of digital images has increased especially on the Internet and social networks. Those images are used in everyday life, as authenticated proofs or corroboratory evidence in areas like: forensic studies, law enforcement, journalism and others. There are many available software applications used for modification of digital image content. Digital photographs can be easily modified, so one cannot be certain about delivered information from the image content. Therefore, it is important to develop methods to be used in checking the image authenticity, which will indicate any changes in the image content.

The methods for change detection in images can be divided into two categories: active and passive. Active methods imply that a digital signature or a watermark is inserted into the original image. It requires additional information to be embedded in the image during the capturing process or, at later, by authorized personnel, which will be a proof that the image is genuine, i.e. the image content can be trusted when delivered. On the other hand, passive methods allow change detection in images without the presence of additional information. They can be used to detect changes in images by extracting natural features within the image or to detect the source of the image generation, based on optical and sensor regularities.

Numerous methods have been proposed for image forensics, where: image splicing, copy move and image recoloring or fake colorizing are analyzed [13, 14, 32]. Image splicing means that a new image is made by combining content from two or more different images. Image recoloring is a technique that can transfer image color or theme, making imperceptible changes. Copy-Move Forgery Detection (CMFD) is one of the most popular methods in image forensics, where content changes are deliberately made by copying a part of an image and pasting it to another location within the image. An example of copy-move forgery is given in Fig. 1, where both the original and the modified image (forgery) are presented. In Fig. 1 the modification is made so that the soldier from the original image is covered by copying a piece of grass found within the image.

Fig. 1
figure 1

Copy-move forgery example

The typical aim of the copy-move changes is to duplicate (or multiplicate) an object found in the image or to cover some image parts in order to convey information different from the original image. Thus, the image content is modified all in purpose to deliver erroneous information. Such deliberately made changes, e.g. copied parts, can be: translated, rotated, scaled, filtered, and so on. If parts of the original image are copied somewhere else within the same image, noise components, color, dynamic range and other corresponding important features of the copied parts will be compatible with the rest of the image. Consequently, it is not an easy task to detect the changes by methods that seek for incompatibilities according to image statistics, and statistics of various image parts.

CMFD methods can be keypoint-based or block-based.

The keypoint-based methods use characteristic point extraction, where the extraction is made only in particular regions, without any subdivisions of the image. The keypoint features are based on the characteristics such as: corners, blobs, edges. The characteristics should be helpful in increasing the accuracy of matching. The obtained feature vectors are classified and matched to each other in order to find the duplicated regions in the copy-move forgery.

Various techniques and transforms are used in the literature for the region clustering and keypoint-based matching, like: Discrete Radon Polar Complex Exponential Transform (DRPCET), the Radon transform (RT) and the Polar Complex Exponential Transform (PCET) [40], an Adaptive Patch Matching algorithm and Matched Keypoints Merging algorithm [6], Local Bidirectional Coherency Error [3], Scale Invariant Feature Transform (SIFT) [35], FAST (Features from Accelerated Segment Test) and scaled ORB (Oriented FAST and Rotated BRIEF (Binary Robust Independent Elementary Feature)) [36], discrete analytical Fourier–Mellin transform (DAFMT) [38], analytical Fourier–Mellin transform (AFMT) [36], Local Binary Pattern (LBP) and Discrete Cosine Transform (DCT) [1, 19, 21], Speeded-Up Robust Feature (SURF) [15], Multi-Level Dense Descriptor (MLDD) [4], local Gabor wavelets patterns (LGWP) [6], planar homography constraint and graph cut [34], SIFT and overlapping region-based global context descriptor (OR-GCD) [39].

The block-based CMFD methods use overlapping or non-overlapping blocks for forgery image analysis. Descriptors are calculated for individual blocks and used in order to determine similarities among the blocks, i.e. different parts of the analyzed image. The block-based CMFD methods use different descriptors for the similarity matching, where the blocks suspected as forgery are typically the ones where the matching is detected. The algorithms usually applied for block-based approaches use: Radial Harmonic Fourier Moments (RHFMs) and SIFT [9], stationary wavelet transform (SWT) and DCT [21], invariant Quaternion Exponent Moments (QEMs) [31], Discrete Radial Harmonic Fourier Moments (DRHFMs) [38], DCT [2, 30], LBP and DCT [1], DWT and DCT [15], Swarm Intelligence (SI) and SIFT [35], Auto Color Correlogram (ACC) [22], Histogram of Gradients (HOG) [19], PCET and Approximate Nearest Neighbor (ANN) Searching [8], LBP Histogram, Fourier features and Fast Walsh-Hadamard Transform (FWHT) features [29].

In addition to the above-mentioned methods, there were also a few considerations of using fractal and multifractal dimension in order to perform CMFD [17, 26]. In [17], authors use multifractal dimension, block-based mean and variance of pixel values and central moment, while in [26] the authors apply Local Fractal Dimension (LFD) and Singular Value Decomposition (SVD) in order to perform CMFD.

The aim of this research is to develop a multifractal-based method in order to detect changes in the images produced by copy-move and multi-paste methods. Namely, copy-move image forgery detection using multifractal spectrum and its characteristics, as well as some common statistical parameters, has been carried out to show how block size affects the forgery detection accuracy, as well as to analyze the impact of the block size on the precision and recall. Image database used in this paper considers data from a publicly available CoMoFoD and Image Manipulation Dataset [7, 16].

The paper is consisted of six sections. The Introduction section is followed by Section II where the methods from the literature are briefly described. Section III presents some of the main multifractal spectrum image characteristics. The proposed method and mathematical model used for block classification are presented in Section IV. The experimental results followed by discussion are presented in Section V. Section VI represents the main conclusions of the proposed multifractal-based method.

2 Related work

In the last decade different methods were developed for detection of changes in images, especially the ones made in copy-move manipulations. CMFD can be divided into keypoint-based and block-based methods.

2.1 Keypoint-based methods

Zhong, J., Gan, Y., Young, J., Huang, L., & Lin, P.in [38] proposed the method based on color invariance and QPCET (quaternion polar complex exponential transform). The points of interest in an image are extracted using point detector, which involves the Speeded Up Robust Features (SURF) detector and color invariance model. In [6], Bi, X., Pun, C. M., & Yuan, X. C. suggest copy-move forgery detection method using multi-scale feature extraction and adaptive matching. The image is firstly segmented into the non-overlapping patches of irregular shape in different scales, where Scale Invariant Feature Transform (SIFT) is applied to extract feature points from all patches. The method based on a coherency sensitive hashing for establishing the feature correspondences and a local bidirectional coherency in an image is presented in [3], by Bi, X., & Pun, C. M. In [33], Sun, X., Guo, H., Xia, Z., & Chen, X. presented keypoint method based on the modified SIFT-based detector. For keypoints distribution, a novel strategy is developed for interspersing, where the keypoints are described by an improved SIFT descriptor. Zhu, Y., Shen, X., & Chen, H. in [40] proposed the method based on the following steps: establishing a Gaussian scale space, extracting the orientated FAST keypoints and the ORB features in each scale space, reverting the coordinates of the orientated FAST keypoints to the original image and matching the ORB features between each pair of keypoints. The method based on the use of DAFMT was presented in [36] by Zhong, J., & Gan, Y.. The image is firstly converted from RGB to grayscale domain, and then the DAFMT is applied in extracting the characteristics. The characteristics are lexicographically sorted and then compared using the Spearman rank correlation coefficient. In [1], Alahmadi, A., Hussain, M., Aboalsamh, H., Muhammad, G., Bebis, G., & Mathkour, H. presented the method based on steerable pyramid transform (SPT) and Local Binary Pattern (LBP). The color image is firstly converted into the YCbCr system, and then the SPT is applied to the chrominance channels in order to provide multiple and multi-oriented subbands. SVM (Support Vector Machine) is used for the classification. The method based on the analysis of discrete cosine transform coefficients is presented in [20] by Lin, C. S., & Tsay, J. J.. The method was proved to be convenient for change detection in images that have undergone JPEG compression and contain the accompanying artefacts. Kaushik, R., Bajaj, R. K., & Mathew, J. in [18] describe the method based on two-dimensional DCT and statistical moments. The window centered around each pixel slides through the image. DCT is calculated for each window, and a quantization matrix is obtained. In [15], the authors describe a method that uses the SURF algorithm in the opponent’s color space. Namely, the color image is firstly converted to the opponent color space. The color gradient is calculated for each pixel, and it is used as the workspace for the SURF algorithm for keypoints extraction. The method based on the application of analytical Fourier Mellin transform (AFTM) is shown [9] by Gan, Y., & Zhong, J.. The idea is to use AFTM in order to provide scaling and rotational invariance for constructing the image geometric invariance. In [4], Bi, X., Pun, C. M., & Yuan, X. C. present the method using Multi-Level Dense Descriptor (MLDD) and Hierarchical Feature Matching. MLDD is calculated for each pixel, and it consists of two parts: the Color Texture Descriptor and the Invariant Moment Descriptor. After calculating the MLDD, Hierarchical Feature Matching is applied to detect suspicious regions. In [34], Zhang, W., Cao, X., Qu, Y., Hou, Y., Zhao, H., & Zhang, C. proposed a method based on the planar homography constraint with automatic extraction using graph cut. Firstly, the fake region is located roughly by using the planar homography constraint. Secondly, the fake object is segmented via graph cut. This method works efficiently as long as there are image regions satisfying the planar homography constraint. In [39], the authors proposed a method based on SIFT algorithm and the overlapping region-based global context descriptor (OR-GCD). The proposed method consists of three main steps: SIFT feature matching, OR-GCD extraction, and verification of SIFT matches. In order to filter false matches for copy detection, the authors used a global context verification scheme.

2.2 Block-based methods

The block-based CMFD is popular approach adopted by researchers in recent years, due to its compatibility with various feature extraction techniques and an increased matching performance. Chou, C. L., & Lee, J. C. [8] proposed the block-based passive method for copy-move forgery detection based on LGWP and rotation-invariant ability of uniform LBP. In [11], Gan, Y., Chung, J., Young, J., Hu, Z., & Zhao, J. suggested the block-based method which combines RHFMs and SIFT algorithm. Texture patches are used in segmentation, and the Simple Linear Iterative Clustering (SLIC) is proposed. Mahmood, T., Mehmood, Z., Shah, M., & Saba, T. in [21] proposed the method based on the extracts of SWT, because of its impressive localization properties, in both spectral and spatial domain. For reduction of the feature vector dimension, DCT is applied. Robust copy–move forgery detection approach by using invariant QEMs is presented in [31], by Wang, X. Y., Liu, Y. N., Xu, H., Wang, P., & Yang, H. Y. The tempered color image is firstly preprocessed with Gaussian low-pass filter and then divided into overlapping circular blocks. Then, the descriptor, QEMs modulus, is calculated for each block. A new copy-move method is presented in [37], by Zhong, J., Gan, Y., Young, J., Huang, L., & Lin, P., where an image is firstly divided into overlapped blocks, and then the local image features are extracted by the DRHFMs. The similar feature vectors corresponding to blocks are found by 2 Nearest Neighbors (2NN) test and Euclidean distance. In [2], Alkawaz, M. H., Sulong, G., Saba, T., & Rehman, A. suggested a method based on DCT coefficients. Firstly, RGB image is converted into grayscale domain and, then, the grayscale image is divided into overlying blocks. 2D DCT coefficients are calculated for each block and the duplicated block is located by the Euclidean distance. In [35], Zhao, F., Shi, W., Qin, B., & Liang, B. proposed a method, where the standard deviation is calculated for each block, and then the image is divided into layers according to the standard deviation value. The SIFT algorithm is used to detect each layer. The method with the automatic determination of decision threshold was presented in [30] by Ustubioglu, B., Ulutas, G., Ulutas, M., & Nabiyev, V. The image is firstly converted to the YCbCr color system, and then divided into overlapping blocks. The feature vectors are extracted for each block using DCT and the zig-zag scanning. In [28], Shih, F. Y., & Jackson, J. K. describe a method for CMFD, where the algorithm loads the grayscale image of resolution 256 × 256, then divides it into blocks for which either PCA (Principal Component Analysis) or DCT is performed. The calculated values obtained by PCA or DCT are compared, and the duplicated regions are found. Malviya, A. V., & Ladhake, S. A. in [22] presented a block-based method, where an input image is filtered in order to remove the noise, and then divided into blocks of dimension MxN. Each block is subject to 8Z affine transformation. The characteristics of each block are extracted by Auto Color Correlogram (ACC). In [19], Lee, J. C., Chang, C. P., & Chen, W. K. present a method based on a histogram of orientated gradients. The image is firstly divided into overlapping blocks of fixed size. The characteristics of each block are calculated using the Histogram of Gradient (HOG) descriptor. Similar blocks are searched, and the result is mapped to the image being tested. In [10], the authors Emam, M., Han, Q., & Niu, X., proposed a copy-move detection method with a combination of geometric transformations based on PCET. The image is firstly divided into overlapping blocks, and in each block PCET transformations are used to extract the characteristics. ANN Searching is used to identify similar blocks. In [1], Alahmadi, A., Hussain, M., Aboalsamh, H., Muhammad, G., Bebis, G., & Mathkour, H., proposed a method based on DCT and LBP. Firstly, the color image is converted into the YCbCr system and divided into overlapping blocks with 50% overlap. LBP operator is applied on each block and each LBP encoded block is then transformed into a frequency domain using DCT. The SVM is used to classify the coefficients. Hayat, K., & Qazi, T. [15] suggested a method based on DWT and DCT. The image is firstly converted to grayscale and then the DWT is applied. After applying the DWT, an image is divided into overlapping blocks, where DCT is applied to each block. Soni, B., Das, P. K., & Thounaojam, D. M. [29] proposed two methods based on the Local Binary Pattern Histogram Fourier features and Fast Walsh-Hadamard Transform features. Input image is first converted from RGB to grayscale domain, and divided into overlapping blocks of different size. In order to match similar blocks, the authors used Euclidean distance. In [25], Mohsen Jenadeleh and Mohsen Ebrahim Moghaddam proposed a method based on modified fractal coding and matching characteristic vectors. The image is firstly divided into overlapping blocks, and then the extraction of fractal coding features of each block is performed. These features are used for detection of the copied regions. Rani Susan Oommen, Jayamohan M. and Sruthy S. in [26] proposed a local Fractal Dimension (LFD) approach with applying the SVD. An image is firstly divided into blocks of fixed dimensions and for each block the local fractal dimension is estimated. In order to reduce the complexity, image blocks are arranged in B + tree, according to LFD values.

Although the above-mentioned methods detect forged image regions, their computational complexity is very large. Also, there is a high percentage of false positives that give miss-detection results.

Block-based methods use overlapping or non-overlapping blocks (windows) of a square shape for an image analysis. For each block, a characteristic feature vector is calculated, and then these vectors are compared in order to find duplicates in blocks. Most of the block-based methods work with blocks of size 8 × 8. What results will be obtained if we use blocks of other dimensions, for example 16 × 16 or 32 × 32?

The concepts to be considered when calculating the accuracy of forgery detection of image parts are: the number of correct detections (TP-True Positive), the number regarding incorrect detections as forged (FP-False Positive) and missed forgered regons (FN-False Negative).

In this paper, copy-move image forgery detection using multifractal spectrum and its characteristics, as well as common parameters like mean value and standard deviation corresponding to blocks, has been carried out to show how block size affects the forgery detection accuracy, as well as to analyze the impact of the block size on recall and precision. Having compared with other existing techniques, it is important to obtain two main advantages:

  1. 1)

    better performances in terms of FP and FN, i.e. recall and precision, and

  2. 2)

    lower dimension of the feature vector.

3 Multifractal spectrum

In multifractal process, two matrices are created: the Hölder exponent matrix α, which describes the local regularity of the image (pixels) being analyzed and the distribution matrix of these coefficients - the spectrum f(α) or the multifractal spectrum of the image [27]. The multifractal spectrum gives a global description of an image (or, more generally, the phenomenon being examined). The parameter α gives local signal information. The spectrum f(α) describes a signal globally. A signal from a local and a global point of view can be described based on the pair (αf(α)). The small values α denote poorly modified signal locally. The small values of f(α) indicate a phenomenon with a local value of α which is unlikely (it seldom occurs), and vice versa, for large f(α) [27].

The value of the Hölder exponent depends on the position in the structure and describes the local signal regularity. Namely, different objects in the image have different spectra, different positions of the maximum, which proved to be an interesting method for detecting changes in the images. Figure 2a and b give a simple example of an image and its changes, while Fig. 2c shows their multifractal spectra. In Fig. 2c, it can be clearly seen that there is a difference in the values of the multifractal spectrum and Hölder’s exponent for the original image and its modification.

Fig. 2
figure 2

The example of a original, b modified image and c their multifractal spectrum

By simply calculating the multifractal spectrum of the original image and its modification and by extracting their characteristic features, it can be seen there is a difference in the calculated spectra, especially in the parts rounded up in the Fig. 2c. By analyzing the multifractal spectra for a large number of original and modified images, we have concluded that such differences are the main indicators that there has been a change in an image content. Therefore, these can be used as descriptors of the image blocks (Section 4).

4 Proposed method

In this paper, Copy-Move image forgery detection based on multifractal spectrum and its characteristics, as nonlinear parameters, as well as mean value and standard deviation of block pixels, has been carried out to explore the posibility of detection of forged parts in images and effect of block size on performance of tampered region detection in terms of FP and FN. We implemented the blocks of sizes 8 × 8, 16 × 16 and 32 × 32, in the proposed method. Figure 3 presents the diagram explaining algorithm proposed in this paper.

Fig. 3
figure 3

The algorithm of the proposed method based on multifractals

The experimental analysis is carried out in five steps.

  1. Step 1:

    Preprocessing of an Image

The first step in our experiment is to convert the RGB input into the intensity image. Low frequency component features are considered more useful in the feature matching than high frequency components. Thus, the Gaussian low-pass filter is employed to reduce high frequency components. In this implementation, the filter size is 5 × 5 and the standard deviation of the filter is 0.5.

Next, the three color components are used to obtain a grayscale image based on the well-known Eq. (1):

$$ {X}_{gray}=0.299\times R+0.587\times G+0.114\times B $$
(1)

where R, G and B are red, green and blue components of the image, respectively.

  1. Step 2:

    Partition into non-overlapping blocks

After RGB to grayscale image conversion, the image is divided into non-overlapping square blocks of dimension mxm, for m = 32, 16, 8. For the purposes of testing, the resolution of the images is 256 × 256, which does not affect the generality, but allows a better transparency of the method. The method can be applied to arbitrary image resolution. Accordingly, each image is divided into 64, 256 and 1024 non-overlapping blocks, respectively. Figure 4 represents the division of an image into non-overlapping blocks.

Fig. 4
figure 4

The image division into non-overlapping blocks

  1. Step 3:

    Feature extraction

The next step is feature extraction for each block of the image of interest. In the case of images modified by Copy-Move Method, the copied and pasted parts have similar structure, and a multifractal analysis can be applied, which basically analyzes the self-similarity. For each block, a multifractal spectrum is calculated by histogram method, described in detail in [27]. The descriptors used to generate feature vectors of blocks consist of two parts: multifractal and non-multifractal (or common statistical) descriptors. Multifractal descriptors are characteristics of multifractal spectrum: α0 (the position of multifractal spectrum maximum), αmin and αmax (minimum and maximum value of Hölder’s exponent), α1 (the first zero, or the first minimum value of f(α)), the difference between α1 and αmin,  = α1 − αmin (Fig. 5), while linear descriptors are mean value and standard deviation of block pixels (Eqs. (2), (3) for the blocks of size 32 × 32, Eqs. (4), (5) for the blocks of size 16 × 16 and Eqs. (6) and (7) for the blocks of size 8 × 8).

Fig. 5
figure 5

The parameters of multifractal spectrum used as block descriptors

$$ {\overline{x}}_{32}=\frac{1}{1024}{\sum}_{i=1}^{1024}{x}_i $$
(2)
$$ {\sigma}_{32}^2=\frac{1}{1024}{\sum}_{i=1}^{1024}{\left({x}_i-{\overline{x}}_{32}\right)}^2 $$
(3)
$$ {\overline{x}}_{16}=\frac{1}{256}{\sum}_{i=1}^{256}{x}_i $$
(4)
$$ {\sigma}_{16}^2=\frac{1}{256}{\sum}_{i=1}^{256}{\left({x}_i-{\overline{x}}_{16}\right)}^2 $$
(5)
$$ {\overline{x}}_8=\frac{1}{64}{\sum}_{i=1}^{64}{x}_i $$
(6)
$$ {\sigma}_8^2=\frac{1}{64}{\sum}_{i=1}^{64}{\left({x}_i-{\overline{x}}_8\right)}^2 $$
(7)
  1. Step 4:

    Block matching

For the purpose of this research, a new sem-metric that can be used to compare the objests, has been developed d : Rn × Rn → [0, +∞), and it can be used to compare the objects (blocks) xi, xj ∈ Rn:

$$ \mathrm{d}\left({\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}}\right)=\left\{\begin{array}{cc}0&, \mathrm{i}\mathrm{f}\ \mathrm{i}=\mathrm{j}\\ {}\frac{\sum \limits_{\mathrm{k}=1}^{\mathrm{n}}{\left({\mathrm{x}}_{\mathrm{i}}^{\mathrm{k}}-{\mathrm{x}}_{\mathrm{j}}^{\mathrm{k}}\right)}^2}{\left|\mathrm{i}-\mathrm{j}\right|}& .\mathrm{if}\ \mathrm{i}\ne \mathrm{j}\end{array}\right.,\mathrm{i},\mathrm{j}=1,\dots, \mathrm{m} $$
(8)

where \( {\mathrm{x}}_{\mathrm{i}}\ \left({\mathrm{x}}_{\mathrm{i}}^1,{\mathrm{x}}_{\mathrm{i}}^2,\dots, {\mathrm{x}}_{\mathrm{i}}^{\mathrm{n}}\right) \) is a characteristic feature vector which describes the block with attributes \( {\mathrm{x}}_{\mathrm{i}}^1,{\mathrm{x}}_{\mathrm{i}}^2,\dots, {\mathrm{x}}_{\mathrm{i}}^{\mathrm{n}} \), i = 1, … , m, where m is the number of blocks, n = 7 is the number of descriptors for each block, and i and j represent the position of the image block.

Lemma: d : Rn × Rn → [0, +∞) defined with

$$ \mathrm{d}\left({\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}}\right)=\left\{\begin{array}{cc}0&, \mathrm{i}\mathrm{f}\ \mathrm{i}=\mathrm{j}\\ {}\frac{\sum \limits_{\mathrm{k}=1}^{\mathrm{n}}{\left({\mathrm{x}}_{\mathrm{i}}^{\mathrm{k}}-{\mathrm{x}}_{\mathrm{j}}^{\mathrm{k}}\right)}^2}{\left|\mathrm{i}-\mathrm{j}\right|}& .\mathrm{if}\ \mathrm{i}\ne \mathrm{j}\end{array}\right.,\mathrm{i},\mathrm{j}=1,\dots, \mathrm{m},\mathrm{m}\in \mathrm{N} $$

is semi-metric.

Proof:

We prove the symmetry is true.

∀xi, xj ∈ Rn if i = j the proof is trivial: d(xi, xj) = 0 = d(xj, xi).

If i ≠ j then \( \mathrm{d}\left({\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}}\right)=\frac{\sum_{\mathrm{k}=1}^{\mathrm{n}}{\left({\mathrm{x}}_{\mathrm{i}}^{\mathrm{k}}-{\mathrm{x}}_{\mathrm{j}}^{\mathrm{k}}\right)}^2}{\left|\mathrm{i}-\mathrm{j}\right|}=\frac{\sum_{\mathrm{k}=1}^{\mathrm{n}}{\left({\mathrm{x}}_{\mathrm{j}}^{\mathrm{k}}-{\mathrm{x}}_{\mathrm{i}}^{\mathrm{k}}\right)}^2}{\left|\mathrm{j}-\mathrm{i}\right|}=\mathrm{d}\left({\mathrm{x}}_{\mathrm{j}},{\mathrm{x}}_{\mathrm{i}}\right) \).

Reflexivity: ∀xi ∈ Rn because i = j, d(xi, xi) = 0.

Identity of indiscernibles: ∀xi, xj ∈ Rnd(xi, xj) = 0 ⟺ i = j ⟺ xi = xj.

The lemma has been proved.

Also, we used interval [min, max], where min is the smallest value of all minimum values of semi-metric of changed blocks and both of the image databases [7, 16] (calculated using the formula (8)), and max represents the highest value of all the maximum values of semi-metric of the changed blocks from all the bases (calculated using the formula (8)). More precisely, it is X1, X2, … Xp finite number of sets (in this case-datasets of images) with finite elements xi ∈ Xk, i = 1, 2, … , mk and k = 1, 2, … , p, p, mk ∈ N. Let \( \min =\underset{1\le \mathrm{k}\le \mathrm{p}}{\min}\mathrm{d}\left({\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}}\right),{\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}}\in {\mathrm{X}}_{\mathrm{k}} \), where xi, xj are changed block vectors and \( \max =\underset{1\le \mathrm{k}\le \mathrm{p}}{\max}\mathrm{d}\left({\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}}\right),{\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}}\in {\mathrm{X}}_{\mathrm{k}} \), where xi, xj are changed block vectors. To make a decision, the cluster grouping is used only for blocks whose semi-metrics d from each other are in this interval [min, max].

  1. Step 5:

    Extraction of forged areas

The Basic Variable Neighborhood Search (BVNS) [23, 24] applied in the problem of detection of changed blocks has been implemented in the following way. Firstly, the distances have been calculated (by using semi-metric) among blocks by applying formula (8). In preprocessing phase, the types of matric distances (and also the appropriate indexes of blocks) have been sorted in a non-declining order [11]. Then, the interval [min, max] is determined based on the process described in Step 4, and the blocks with distances in this interval have only been observed. This data is used for more efficient implementation of the shaking operator. Indeed, since every solution is characterized by the centroid set, the shaking operator is replaced with the appropriate number of centroids. More precisely, shaking in the neighborhood means that centroids are to be replaced by accidently chosen blocks that are not centroids, and they are the most distant from k-th place of the the centroid they change (centroids are actually blocks). In each step replacement of all centroids is considered, and there will not be any replacement if an accidently chosen blocks are the closest to the centroid (actually it is the centroid itself). Local searching consists of a systematic replacement of one centroid with the block that is not a centroid. This starts from the solution gained by shaking, and it is performed in line with the principle of the best improvement, as long as there is an improvement [12].

With a pseudo-code, BVNS can be presented in the following way [12]:

  • Initialization. Choose the starting solution x ∈ X and define the stopping criteria STOP = 0

  • Repeat {

    • i = 1

    • Repeat

    • {

      • Shaking () – Generate the accidental solution xyi in the neighborhood from x.

      • Local searching () – mark with x′′, gained local minimum by applying some of the procedures of the local searching starting from x.

      • Checking of the solution () – If a local minimum is better than the current minimum, it should be moved to the solution, i.e. x = x′′.

    • Continue to the new starting solution in the neighborhood i = 1

Otherwise, move to the next surrounding, i.e. i = i + 1.

If a stopping criterion is satisfied, set the value STOP = 1.

  • } until it is i = imax or STOP = 1

  • } until it is STOP = 1

4.1 Performance measurement

The performance measurement focuses on the accuracy, which is described in the following. Precision signifies the probability for the correct forgery of the detected blocks as forgery, whereas recall determines the probability of forged blocks in the set of images being detected (Eqs. (9) and (10)). True positive (TP) represents the number of tampered blocks, which are classified as tampered, false positive (FP) represents the number of authentic blocks, which are classified as tampered, and false negative (FN) represents the number of tampered blocks, which are classified as authentic.

$$ \mathrm{Precision}=\mathrm{TP}/\left(\mathrm{TP}+\mathrm{FP}\right) $$
(9)
$$ \mathrm{Recall}=\mathrm{TP}/\left(\mathrm{TP}+\mathrm{FN}\right) $$
(10)

5 Experimental results and discussion

In this section, the experimental results are presented to verify the accuracy performance of the proposed method. The implemented method is evaluated by utilizing the images from publicly available CoMoFoD dataset and Image Manipulation Dataset [7, 16]. CoMoFoD and Image Manipulation Dataset present common datasets for benchmarking the detection of image tampering artefacts [7, 16]. CoMoFoD dataset consists of 200 images: 100 original images and 100 tampered images, while Image Manipulation Dataset consists of 48 base images [7, 16]. The standard image size has been set as 256 × 256. The tampered images have been generated by copying and pasting image parts. The parts of images can be geometrically transformed before pasting them, by applying scaling and rotation. The pasted regions can vary in size (small, medium and large). In this paper, the results for 10 (of 100 used in the research) images are presented. In each of these images, one or more regions are copied. Also, the size of the copied regions varies among the images. Original and forged images and its ground truth, indicating the forged area, are shown in Figs. 6 and 7.

Fig. 6
figure 6

The first five examples for Copy-Move forgery: original images (the first row), the corresponding tampered images (the second row), and ground truth maps (the third row)

Fig. 7
figure 7

The second five examples for Copy-Move forgery: original images (the first row), the corresponding tampered images (the second row), and ground truth maps (the third row)

This research has implemented the block-based copy-move image forgery detection approach using multifractal spectrum for non-overlapping blocks of different size. Figure 8 presents the forgery detection results for the first set of tested images (images from Fig. 6), for different block sizes (32 × 32, 16 × 16 and 8 × 8), while Fig. 9 presents forgery detection results for the second set of tested images (images from Fig. 7), also for different block sizes.

Fig. 8
figure 8

The first five examples for Copy-Move forgery detection: original images (the first row), the corresponding modified images (the second row), ground truth maps (the third row), forgery detection for 32 × 32 non-overlapping blocks (the fourth row), forgery detection for 16 × 16 non-overlapping blocks (the fifth row) and forgery detection for 8 × 8 non-overlapping blocks (the sixth row)

Fig. 9
figure 9

The second five examples for Copy-Move forgery detection: original images (the first row), the corresponding modified images (the second row), ground truth maps (the third row), forgery detection for 32 × 32 non-overlapping blocks (the fourth row), forgery detection for 16 × 16 non-overlapping blocks (the fifth row) and forgery detection for 8 × 8 non-overlapping blocks (the sixth row)

The detection accuracy performance of the implemented method is calculated in terms of precision and recall for images from CoMoFoD and Image Database datasets [7, 16]. Each of the input images is tested to analyze the effects of different non-overlapping block size on the accuracy performance for the forgery detection, in terms of precision and recall. The described algorithm, along with the proposed semi-metric for the similarity, has been implemented in the C# programming language. Success of the classification of the defined semi-metric is shown by comparing the results with paper [2], for block size of 8 × 8, and for images from CoMoFoD dataset [7] (I, II, IV, V, VI, VII, VIII, IX and X). The success rates are given in Table 1. Table 2 shows the results for the same set of images, only for block size of 16 × 16.

Table 1 Precision-recall performance of 8 × 8 block size in percentage (%), for images I, II, III, IV, V, VI, VII, VIII, IX and X
Table 2 Precision-recall performance of 16 × 16 block size in percentage (%), for images I, II, III, IV, V, VI, VII, VIII, IX and X

For the purpose of testing our method, an image from the Image Dataset with a more complicated structure (containing sky, forest, sea, sand, buildings) is also tested. The results obtained for the image III are shown in Tables 1 and 2, for different block sizes. The image numeration in Tables 1 and 2 represents the order number associated with images in Figs. 6 and 7, respectively.

From Table 1, we can see that significantly better results are obtained in terms of precision and recall for images denoted by I, IV,V, VI, VII, VIII and X, especially in the term of precision in the case of Images IV,V,VI, VII, IX and X. Also, the results obtained for Image II are relatively satisfying in term of precision, as well as results obtained for image IX, in term of recall.

The results presented in Table 2 show high accuracy percentage for the forgery detection, in terms of precision and recall. In comparison with the results shown in Table 1, we can conclude that greater accuracy is obtained by applying blocks of size 16 × 16, for the same set of tested images.

Considering the results obtained for Image III (Tables 1 and 2), we can conclude that our method gives fairly good results, even when the tested image is of more complex content. In this case, it is better to use blocks of smaller dimensions (higher accuracy is obtained using blocks of size 8 × 8 comparing to size 16 × 16).

In order to compare the performance of our method with the performance of other authors’ methods, we use- measure performance parameters for the accuracy of the proposed algorithm: correct detection ratio (CDR), used by authors in [6], and defined as follows:

$$ \mathrm{CDR}=\frac{\mathrm{The}\ \mathrm{detected}\ \mathrm{tampered}\ \mathrm{region}}{\mathrm{The}\ \mathrm{tampered}\ \mathrm{region}} $$
(11)

as well as the correct detection ratio FC and the false detection ratio Ff, used by autors in [19], and defined as follows:

$$ {F}_C=\frac{\left|\upmu \cap {\upmu}^{\mathrm{C}}\right|+\left|\upomega \cap {\upomega}^{\mathrm{C}}\right|}{\left|\upmu \right|+\left|\upomega \right|} $$
(12)
$$ {F}_f=\frac{\left|{\upmu}^{\mathrm{C}}-\upmu \right|+\left|{\upomega}^{\mathrm{C}}-\upomega \right|}{\left|{\upmu}^{\mathrm{C}}\right|+\left|{\upomega}^{\mathrm{C}}\right|} $$
(13)

where μ and ω denote pixels in original region and corresponding forgery region respectively, and μC and ωC denote pixels of original region and forgery region respectively, ∣ ∣ refers to the area of the region, ∩ refers to the intersection of two regions, and – refers to the difference between two regions [19]. The ratio FC indicates the performance of the algorithm in correct locating the pixels of copy–move regions in the tampered image, while Ff reflects the percentage of pixels that were false positives (i.e. incorrectly identified as forgeries) [19]. In other words, the two values indicate the precision with which the proposed algorithm locates copy–move regions. The closer FC is to 1 and Ff is to 0, the more precise the method is [19].

In the Table 3 the comparison results between the results obtained in [6] and the proposed approach are given in the terms of CDR, for Image II in the case of the blocks of size 16 × 16 and 32 × 32. Table 4 shows the results of the comparison with the results obtained in [19], in terms of FC and Ff, for Images II, V, VI and VII, for the blocks of size 16 × 16, while Table 5 shows the same results for the blocks of size 32 × 32.

Table 3 Accuracy CDR performance in the case of 16 × 16 and 32 × 32 block size, for Image II
Table 4 AccuracyFC and Ff performance in the case of 16 × 16 block size, for Images II, V, VII and VIII
Table 5 AccuracyFC andFf performance in the case of 32 × 32 block size, for Images II, V, VII and VIII

From Table 3, we can see that we get better results in terms of CDR in the case of block size 16 × 16. But results obtained in the case of block size 32 × 32 are worst, in comparation with results obtained in [6]. The proposed algorithm gives better results for blocks sizes of 16 × 16 than block sizes 32 × 32 because some portions of the forged regions are so small and they cannot be detected by using larger block sizes. From Table 4, we can see that we get significantly better results in terms of FC and Ff for Image II, VI and VII, for the block of size 16 × 16. Also, the results obtained for Image V are relatively satisfying in terms of FC and Ff. From Table 5, we can see we get significantly better results in terms of FC and Ff for Image II, V and VII, for the block of size 32 × 32.

The computational complexity is one of the most important issues in CMFD, which is affected by the number of descriptors for each block (the dimensionality of selected feature vector). In other words, computational complexity can be minimized by finding a method to reduce the dimensionality of feature vectors.

In this paper, we used multifractal spectrum and its parameters, as well as standard statistical parameters, to generate feature vector for each block. In our experiments, feature vector for each block is 7-dimensional. Table 6 shows the efficacy of the proposed technique against other existing methods. Compared with [2, 5, 6, 8, 15, 21, 33], the dimension of the feature vector is lower, which implies that the proposed technique has a lower computational complexity.

Table 6 Comparisonof computational complexity

The goal of this research is to study the effects of different block size (8 × 8 pixels, 16 × 16 pixels) on the performances in terms of FP and FN. By comparing the results, the results shown in Tables 1 and 2, we can conclude that the block size affects both the precision and the recall. Also, a better accuracy is obtained for the block size of 16 × 16. This is the consequence of using blocks of larger dimension blocks that give more data to calculate the multifractal spectrum, and therefore we get a good representation.

By analyzing the obtained results, it can be seen that our method yields significantly better results in the terms of precision and recall, compared to the results obtained in [2] (Table 1), for images denoted by I, IV, V, VI, VII, VIII, IX and X, especially in term of precision in the case of Images IV, V, VI, VII, IX and X. Also, the results obtained for Images II and IX are relatively satisfying in terms of precision and recall. In the case of a more complicated image structure (containing sky, forest, sea, sand, buildings) the proposed method gave fairly good results (Tables 1 and 2). In that case, it is better to use blocks of smaller dimensions (better accuracy is obtained using blocks of size 8 × 8 comparing to size 16 × 16). Compared to other authors who analyzed the same images [6], we obtained better results in the terms of the CDR, for Image II and the blocks of size 16 × 16 (Table 3). In the terms of FC and Ff, for Image II, VI and VII, we obtained significantly better results, for the block of size 16 × 16. The results obtained for the Image V are relatively satisfying, in comparison to [19] (Table 4). From Table 5, we can conclude that we get better results in terms of FC and Ff, for Images II, V and VII, for block of size 32 × 32. Also, reults obtained form image II are relatively good.

Based on Tables 1,2, 3, 4 and 5, we can conclude that the method proposed in this paper gives excellent results for the blocks of size of 8 × 8 and 16 × 16, while the results for the larger blocks are fairly good. We should point out that most of the compression techniques use 8 × 8 or 16 × 16 blocks, which are important for high detailed images. Thus, in order not to lose changes done in small regions, the most important detections should be based on 8 × 8 and 16 × 16 blocks. Results detected by 32 × 32 blocks may be very good as well for detection of changes in large images. By using the small blocks (for instance, 4 × 4), in the detection algorihtm, it is possible that large changes could stay undetected. On the contrary, huge blocks, such as 32 × 32 or larger, may omit very small changes. So, for CMFD we suggest to use 16 × 16 blocks. Additional check could be done by using larger blocks if the image is of extremely high resolution.

The second aspect important for the development of new CMFD methods is computational complexity, which is affected by the number of descriptors for each block (the dimensionality of characteristic feature vector). In our experiments, characteristic vector for each block is 7-dimensional. Compared with other methods (Table 6), the dimension of the feature vector is lower, which implies that our proposed method has a lower computational complexity.

6 Conclusions

Nowdays high-tech but low-cost tehnology enables to easy create and change image content, add and/or remove some information within an image, or even to make a new image out of two or more images. Among artistric and personal use, for creating different contents from existing images, changes in image content can be a part of criminal activities. Thus, the development of methods for detecting such changes has become very important demand. One of the most researched methods is the copy-move forgery detection (CMFD) one. In this paper, a novel method for CMFD, based on multifractal spectrum and its parameters, common statistical parameters and the new metaheuristic method and semi-metric is proposed here. Clustering as inherent part of the proposed metahuristic method is experimentaly comfired. The results show high degree of success on the specific issues presented in the paper. Compared with the results in literature, the proposed method has achieved better results in terms of precision and recall as a figure of merits. It should be pointed out that the dimensionality of the feature vector used in our approach is lower compared with those used in other methods, leading to lower computational complexity. In further research, besides the algorithm proposed in this paper, new metaheuristic and supervised learng methods, as well as other multifractal based parameters are going to be developed and compared.