1 Introduction

Nowadays, it is quite common for ordinary users to take pictures using portable digital cameras and share them on social networks. With the popularity of digital-image-editing software, more and more images are manipulated before they are shared. As a result, the reliability of images has become very low, and the field of digital forensics has emerged to restore public trust in digital images [30].

Image splicing, which copies one or several regions from the source images and pastes them to the destination image, is one of the most popular image manipulation methods. Figure 1 shows an example of image splicing [7]. In this example, two images in Fig. 1a and b are authentic. Figure 1c is a spliced image which is composed by copying the zebra in Fig. 1b into Fig. 1a. In the image splicing scenario, a spliced region usually undergoes a sequence of post-processing operations such as edge softening, blurring, denoising and smoothing for a better visual appearance. Further, in some cases, images are processed professionally with the intention of forgery. Therefore, it is often impossible to manually identify the authenticity of images even for practiced users [29, 30].

Fig. 1
figure 1

Example of image splicing. c: a spliced image, which is composed from authentic images a and b

Image splicing detection (ISD) has been actively studied by many researches to challenge image tampering. Several approaches distinguish spliced images from authentic images by identifying different traces near the boundaries of spliced regions [4, 5, 8, 16, 28, 34]. In [28], two methods for improving the capability of the bicoherence features in detecting image splicing were proposed, which included estimating the bicoherence features of the authentic counterpart and incorporating features that characterize the variance of the feature performance. 2-dimensional (2-D) phase congruency and statistical moments of characteristic functions of wavelet sub-bands were used for splicing detection in [5]. Properties of camera response function are utilized in [4, 16] while they are combined with noise level inconsistency in [35] to detect image manipulations. Human visual system is less sensitive to the chrominance than to the luminance. Even if spliced image looks natural to human, spliced traces could be still left in chrominance, hence, chrominance information of the image was exploited for splicing detection in [34]. Run length based approach was first proposed in [8] and was recently improved in [14, 26]. Gamma transformation was detected in the input image to detect splicing whilst spliced region was localized based on the probability of overlapping blocks and pixels of that image being gamma transformed [33].

Support vector machine (SVM) has been widely used for classification in ISD problem [4, 5, 8, 9, 12,13,14, 16,17,18, 26, 28, 31, 33, 34, 36, 37] due to its simplicity and efficiency. To deal with high dimensionality of feature vectors, SVM-recursive feature elimination was used in [13, 37] and principal component analysis (PCA) was used in [9, 26]. Convolutional neural networks (CNNs) showed their efficiency in detecting tampering and localizing spliced regions in recent researches with potential results [1,2,3,4]. In order to classify spliced images and authentic images efficiently, along with learning methods, features selection and extraction are noteworthy [22,23,24,25].

It was shown in [12] that ISD techniques using Markov features on the transform domain achieved a greatly superior performance in detecting spliced traces compared to other methods using statistical features. In the very first research of Markov features in ISD [31], the statistical features included moments of characteristic functions and Markov transition probabilities, where the latter contributed the most and outperformed the former in splicing detection performances. In this paper, we propose a Markov features based algorithm for image splicing detection. In the discrete cosine transform (DCT) domain, the proposed algorithm extracts Markov features by exploiting correlations between adjacent coefficients and adjacent blocks. Then, the proposed method performs spliced image classification by using Markov features. The proposed method not only detects spliced images more effectively but also reduces the running time dramatically in comparison with state-of-the-art methods.

The rest of this paper is organized as follows. In Section 2, we briefly review several conventional Markov features based splicing detection methods. In Section 3, we introduce the proposed algorithm in details. Section 4 presents experimental results to show that our algorithm outperforms state-of-the-art methods. Finally, Section 5 concludes the paper and discusses our future works.

2 Related works

In ISD problem, splicing operation usually changes the correlations between image pixels. According to random process theory, Markov random process is a tool to characterize these correlations [31]. Therefore, Markov based feature is a kind of measure which can reflect the statistical changes caused by splicing [36]. Markov features based approaches are successful in ISD problems and still promising to improve. Analysis of several Markov features based methods is given in the remaining of this Section.

The transition probability matrices (TPMs) which obtained from the difference arrays capture pixels or coefficients correlations to detect spliced artifacts [13]. Markov features were extracted in different kinds of domains such as spatial domain [9], DCT domain [9, 12, 13, 17, 31, 36], discrete wavelet transform domain (DWT) [13, 17, 31, 37] and quaternion DCT (QDCT) domain [18].

2-D arrays were generated by applying multi-size block DCT (MBDCT) with statistical features extracted from the input image, and then combined to form an image model for splicing detection in the original method of this research direction [31]. MBDCT was used to extract 168 moment features while 98 Markov features were extracted in horizontal and vertical directions. The statistical features included moments of characteristic functions of wavelet sub-bands and Markov transition probabilities of difference arrays. This method was tested with Columbia image splicing detection evaluation dataset [27], a dataset with low resolution grayscale images and simple tampering operations, and achieved the detection accuracy rate at 91.87%.

Expanded Markov features in DCT domain was proposed to capture correlation caused by 8 × 8 blocking artifact and more features are constructed in DWT domain to characterize the three kinds of dependency among wavelet coefficients across positions, scales and orientations [13]. The number of features was reduced to 100 and the performance peaked at 93.55% for Columbia dataset [27]. An enhanced threshold method was introduced in [17] to reduce the number of features by thresholding difference arrays to an arithmetic progression with common difference of 2. Nevertheless, Markov feature vectors in [17] were extracted from both DCT and DWT domains, their dimension was relatively high. The weakness of this method is the loss of details of feature vectors due to thresholding process. Consequently, it was shown in [17] that the detection performance was relatively low, i.e. the measured highest detection accuracy was 88.43% for Columbia dataset [27]. Another recently published study [37] also extracted Markov features in many sub-bands of DWT domain and the best detection performance for [27] was 89.88%. In [9], spatial and DCT domains were in charge of extracting Markov features and then PCA was used to reduce the dimensionality of feature vectors. The method in [9] achieved the highest detection result for [27] at 98.82%.

In [12], the maximum value among various directional difference values in the DCT domain of three color channels was used to choose Markov features. Threshold expansion reduced the information loss caused by the coefficient thresholding that was used to restrict the number of Markov features. To compensate the increased number of features owing to the threshold expansion, an even-odd Markov state decomposition algorithm was proposed. For different kinds of features, this method worked best for CASIA TIDE v1.0 [6] with accuracy of more than 97.86%. However, the performances for Columbia dataset [27] and CASIA TIDE v2.0 [7] were 91.98% and 94.50%, respectively.

Color information was extracted from blocked images to construct quaternion and then the QDCT coefficients of quaternion blocked images was obtained [18]. Expanded Markov features were generated from the TPMs in QDCT domain to capture intra-block and inter-block correlation among QDCT block coefficients. The detection performances were approximately 95% and 92% for two datasets CASIA TIDE v1.0 and v2.0 [6, 7], respectively.

3 Markov feature extraction

In this Section, we present how to extract 2 types of Markov features, coefficient-wise Markov features and block-wise Markov features, which are used for detecting spliced images. The former are obtained by exploiting correlations among consecutive DCT coefficients and the latter are computed by exploiting coefficient correlations between adjacent blocks. In DCT domain, coefficients can reflect the changes in the local frequency of the original authentic image, which were created by splicing operation [31]. The following Subsections provide a detailed explanation on Markov features extraction process.

3.1 Coefficient-wise Markov features

Consider an input color image of size H × W, which will be converted to gray scale image, and then divided into non-overlapping 8 × 8 blocks. In the proposed method, 2-D DCT is applied to each 8 × 8 block. All DCT coefficients are taken their absolute values and then rounded to the nearest integers. Let us denote the image containing these coefficients by I. The proposed method computes coefficient-wise difference arrays by subtracting consecutive DCT coefficients and then extracts coefficient-wise Markov features using these arrays.

Coefficient-wise difference arrays are computed in 4 directions as shown in Fig. 2. Let us denote 4 coefficient-wise difference arrays obtained in horizontal, vertical, main diagonal, and anti-diagonal directions as Ch, Cv, Cd and Ca, respectively. For example, Ch is calculated by subtracting each coefficient of I from its nearest neighbor in horizontal direction as follows

$$ C_{h}(u,v)=I(u,v)-I(u,v + 1) $$
(1)

where u = 1,2,…, H and v = 1,2,…, W − 1. Cv, Cd and Ca can be obtained similarly with corresponding directions. We apply a thresholding technique for difference arrays to reduce the computational complexity of preceding steps. Let \(\tilde {\mathbf {C}_{h}}\) be the resultant array by thresholding Ch as follows

$$ \tilde{C}_{h}(u,v)= \left\{\begin{array}{lll} T_{1}, & \text{if}\ C_{h}(u,v)>T_{1}, \\ -T_{1}, & \text{if}\ C_{h}(u,v)<-T_{1}, \\ C_{h}(u,v), & \text{otherwise}, \end{array}\right. $$
(2)

where T1 is a positive integer. Similarly, \(\tilde {\mathbf {C}_{v}}\), \(\tilde {\mathbf {C}_{d}}\) and \(\tilde {\mathbf {C}_{a}}\) can be derived by thresholding Cv, Cd and Ca using (2), respectively.

Fig. 2
figure 2

4 directions used for calculating coefficient-wise difference arrays

Next, we compute TPMs using \(\tilde {\mathbf {C}_{h}}\), \(\tilde {\mathbf {C}_{v}}\), \(\tilde {\mathbf {C}_{d}}\) and \(\tilde {\mathbf {C}_{a}}\) to represent the characteristics of these difference arrays. Specifically, TPMs are composed by probabilities of elements in \(\tilde {\mathbf {C}_{h}}\), \(\tilde {\mathbf {C}_{v}}\), \(\tilde {\mathbf {C}_{d}}\) and \(\tilde {\mathbf {C}_{a}}\) to be particular values given the values of their nearest neighbors in horizontal, vertical, diagonal and anti-diagonal directions, respectively. These probabilities capture the dependencies among neighboring DCT coefficients in corresponding direction. Let us define \(\mathbf {{P_{h}^{C}}}\) as the horizontal TPM of \(\tilde {\mathbf {C}_{h}}\). The element at row i and column j of this matrix, \({P_{h}^{C}}(i,j)\), is calculated by the following formula

$$ {P^{C}_{h}}(i,j)=\frac{\sum\limits_{u = 1}^{H}\sum\limits_{v = 1}^{W-2}\delta(\tilde{C}_{h}(u,v)=i) \times \delta(\tilde{C}_{h}(u,v + 1)=j)}{\sum\limits_{u = 1}^{H}\sum\limits_{v = 1}^{W-2}\delta(\tilde{C}_{h}(u,v)=i)}, $$
(3)

where i, j = −T1,−T1 + 1,…,0,…, T1 − 1, T1 and

$$ \delta(a=b)= \left\{\begin{array}{ll} 1, & \text{if}\ a=b, \\ 0, & \text{otherwise}. \end{array}\right. $$
(4)

Note that, in (3), the probability is assigned as 0 if the denominator in the right-hand side is 0. Each pair of i and j is corresponding to one transition probability of the TPM, which is equivalent to one Markov feature. Similarly, \(\mathbf {{P_{v}^{C}}}\), \(\mathbf {{P_{d}^{C}}}\) and \(\mathbf {{P_{a}^{C}}}\) can be derived from (3) and (4). Then, we obtain 4(2T1 + 1)2 coefficient-wise Markov features by subsequently combining all elements of 4 TPMs.

3.2 Block-wise Markov features

In this Subsection, we present how the block-wise Markov features are extracted using I. In 8 × 8 DCT blocks, coefficients are arranged in zigzag order, from the DC coefficient at the top-left, followed by AC coefficients where low-frequency coefficients, which are more likely to be nonzero, are placed before high-frequency coefficients [32]. Moreover, we aim to generate block-wise Markov features which reflect strong correlations between adjacent blocks. For those reasons, DCT coefficients which are close to the bottom row or the rightmost column of 8 × 8 DCT blocks are not employed in computing block-wise difference arrays in the proposed method. In the previous Subsection, coefficient-wise difference arrays are computed using all DCT coefficients. By contrast, block-wise difference arrays are calculated by using only coefficients of top-left n × n sub-blocks in DCT blocks where n < 8.

We introduce an algorithm which constructs a top-left n × n sub-block by exploiting coefficients of an 8 × 8 DCT block. For each 8 × 8 block, the first n2 coefficients in the 8 × 8 zigzag order are taken and then rearranged in the n × n zigzag order to construct the top-left sub-block. The details of sub-block construction process are presented in Fig. 3. After all of sub-blocks are obtained, they are utilized for computing block-wise difference arrays to capture correlations between adjacent blocks. Figure 4 illustrates this algorithm for n = 4. The first 16 coefficients of 8 × 8 DCT block are denoted by letters from a to p where 10 coefficients from a to j remain and 6 coefficients from k to p replace the last 6 coefficients of the zigzag sequence of top-left 4 × 4 sub-block in corresponding order.

Fig. 3
figure 3

Algorithm constructing top-left n × n sub-block by using the first n2 coefficients in the zigzag order for each 8 × 8 DCT block

Fig. 4
figure 4

A top-left 4 × 4 sub-block (right) is constructed from the first 16 coefficients in the zigzag sequence of an 8 × 8 DCT block (left)

The block-wise Markov features are extracted from horizontal block-wise difference array Bh and vertical block-wise difference array Bv. Figure 5 shows an example of calculating Bh for n = 4, where adjacent large dotted squares represent 8 × 8 blocks of I and small solid squares which are located inside large squares represent top-left 4 × 4 sub-blocks. Solid arrows connect pairs of top-left 4 × 4 sub-blocks and we subtract coefficients of sub-blocks on the ending from corresponding coefficients of sub-blocks on the beginning of solid arrows to obtain 4 × 4 blocks on the right, which are indicated by dashed arrows. In general case, the obtained blocks are combined to form Bh. The size of Bh and Bv is K × L where K = nH/8 and L = nW/8. Let us define Bh as

$$ B_{h}(p,q)=I(u,v)-I(u,v + 8), $$
(5)

where p = 1,2,…, K; q = 1,2,…, L; r and s are the remainders when p and q are divided by n, respectively; \(u=\left \lfloor \frac {p}{n}\right \rfloor \times 8 + r\) and \(v=\left \lfloor \frac {q}{n}\right \rfloor \times 8 + s\). Similarly, the vertical block-wise difference array Bv is obtained. Bh and Bv are also applied a thresholding technique. Let \(\tilde {\mathbf {B}_{h}}\) be the resultant array by thresholding Bh as follows

$$ \tilde{B}_{h}(p,q)= \left\{\begin{array}{lll} T_{2}, & \text{if}\ B_{h}(p,q)>T_{2}, \\ -T_{2}, & \text{if}\ B_{h}(p,q)<-T_{2}, \\ B_{h}(p,q), & \text{otherwise}, \end{array}\right. $$
(6)

where T2 is a positive integer. Similarly, \(\tilde {\mathbf {B}_{v}}\) is obtained from Bv using (6). The horizontal TPM of \(\tilde {\mathbf {B}_{h}}\) is denoted by \(\mathbf {{P_{h}^{B}}}\). An element of this TPM, \({P_{h}^{B}}(i,j)\), is obtained as follows

$$ {P^{B}_{h}}(i,j)=\frac{\sum\limits_{p = 1}^{K}\sum\limits_{q = 1}^{L-1}\delta(\tilde{B}_{h}(p,q)=i) \times \delta(\tilde{B}_{h}(p,q + 1)=j)}{\sum\limits_{p = 1}^{K}\sum\limits_{q = 1}^{L-1}\delta(\tilde{B}_{h}(p,q)=i)}, $$
(7)

where i, j = −T2,−T2 + 1,…,0,…, T2 − 1, T2. Similarly, we can obtain \(\mathbf {{P_{v}^{B}}}\), the vertical TPM of \(\tilde {\mathbf {B}_{v}}\). There are 2(2T2 + 1)2 more block-wise Markov features, resulting in 4(2T1 + 1)2 + 2(2T2 + 1)2 dimensional feature vector for classification. Fig. 6 depicts the overall features extraction process of the proposed method.

Fig. 5
figure 5

Illustration of calculating horizontal block-wise difference array for n = 4

Fig. 6
figure 6

Markov features extraction model

4 Experiments

4.1 Dataset and evaluation criteria

There exist several benchmarking datasets for evaluating the performances of ISD algorithms [6, 7, 27]. In our simulations, we used the challenging dataset CASIA TIDE v2.0 [7], which was widely used in recent researches of ISD topic [12, 13, 18]. This dataset contains 7491 authentic and 5123 tampered color images with various sizes from 240 × 160 to 900 × 600 pixels. In comparison with Columbia dataset [27] and CASIA TIDE v1.0 [6], CASIA TIDE v2.0 is a larger dataset with more realistic spliced images owing to complex post-processing operations across tampered regions. We randomly create the same number of authentic images and spliced images from this dataset for our experiments.

We adopt the metric accuracy [10] to quantitatively evaluate the detection performance of the proposed method. Accuracy, denoted by acc, is determined measuring the proportion of correctly classified images

$$ acc=\frac{TP+TN}{S}, $$
(8)

where TP is the number of spliced images that are correctly detected as spliced images, TN is the number of authentic images that are correctly detected as authentic images and S is the total number of testing images.

4.2 Classification using Support Vector Machine

In the proposed method, SVM and radial basis function kernel are used to classify the images. We also use 10-fold cross validation to avoid overfitting and to get more accurate estimation of the performance [15]. Specifically, the dataset is divided into 10 equal subsets where the numbers of authentic and spliced images in each subset are identical. There are 10 folds, each fold is a test in which a subset is chosen as the testing set meanwhile 9 remaining subsets are combined to be the training set. All of the images in the dataset are totally different, therefore, there is no image in the testing set which has been trained before.

The feature vector should be normalized before classification [15]. In our experiments, we linearly scale both training and testing 1 × m feature vector x = [x1, x2,…, xm] to zero-mean vector \( \mathbf {x^{\prime }} = [x^{\prime }_{1}, x^{\prime }_{2}, {\dots } , x^{\prime }_{m}]\) by the following formula

$$ x^{\prime}_{i}=\frac{x_{i}-\mu(\mathbf{x})}{\sigma(\mathbf{x})}, $$
(9)

where i = 1,2,…, m; μ(x) and σ(x) are mean and standard deviation of vector x, respectively.

4.3 Discussion on parameters selection

We give a brief exposition of how parameters are chosen in the experiments of the proposed method. We choose 2 different variables for thresholding the difference arrays: T1 for coefficient-wise difference arrays and T2 for block-wise difference arrays. In 8 × 8 DCT blocks, coefficients are arranged in zigzag order, from the DC coefficient at the top-left, followed by AC coefficients where low-frequency coefficients (which are more likely to be nonzero) are placed before high-frequency coefficients (which are more likely to be zero) [32]. The elements of coefficient-wise difference arrays are formed from subtracting all coefficients of DCT blocks. On the contrary, the elements in block-wise difference arrays are the differences of DCT coefficients of top-left sub-blocks of adjacent 8 × 8 DCT blocks. Since the first n2 DCT coefficients in zigzag order of each block are usually large and they may vary in a vast range, the numbers in block-wise difference arrays vary widely and unpredictably. Hence, we set T1T2.

In order to select appropriate values for T1 and T2, there are some factors should be taken into account. If these thresholds are too small, it is difficult to capture the spliced artifacts. By contrast, if T1 and T2 are too large, the dimensionality of feature vectors will be very large, thus the computational cost might be unmanageable. As a consequence, there will be a trade-off between detection accuracy and computational complexity of the algorithm in the choice of T1 and T2. Empirically, we choose T1 = 4 and T2 = 4,5 in our simulation.

In the proposed method, n = 8 is the special case when the all the DCT coefficients are used to extract block-wise Markov features and DCT blocks are not affected by the sub-block construction algorithm. In Section 3.2, we explained why we use top-left n × n sub-block of 8 × 8 DCT blocks to calculate block-wise difference arrays. Furthermore, the high value of n can cause the very large number of computations, and hence, the running time would increase. In contrast, if n is small, for example n ≤ 2, we may not get enough important coefficients to capture strong correlations among adjacent blocks. As a result, in the simulation, we set n = 3,4,5,6 and the experimental results in Subsection 4.4 will show the optimized value of n for the proposed method.

4.4 Experimental results

To investigate performance of the proposed method, in all different parameters selections, T1 is set to 4. We compare the detection accuracy with different values of n and T2. In each scenario, we perform the simulations for two cases, Case A and Case B. In Case A, the top-left sub-blocks are constructed using sub-block construction algorithm and they are used to calculate block-wise difference arrays. By contrast, in Case B, the original top-left sub-blocks of 8 × 8 DCT blocks are used for computing the difference arrays. Table 1 and Table 2 show the detection performance of the proposed method for T2 = 4 and T2 = 5, respectively. Specifically, Case A achieves higher average detection accuracy than Case B for T2 = 4; n = 3,4,6 and T2 = 5; n = 3,5,6, which proves the efficiency of sub-block construction algorithm. Note that, for n = 8, Case A and Case B are identical; the accuracy obtained for this case is lower than for n = 5 in both Cases A and B when T2 = 4 (Table 1) and T2 = 5 (Table 2). In addition, the proposed method performs well in terms of detection accuracy when n = 5 and it reaches the highest accuracy of 96.90% when T2 = 5;n = 5.

Table 1 Percentages of detection accuracy of the proposed method for T2 = 4
Table 2 Percentages of detection accuracy of the proposed method for T2 = 5

In the proposed method, the computation of block-wise difference arrays using n × n sub-blocks instead of 8 × 8 blocks helps reducing the number of calculations considerably, hence, the computational complexity, the feature extraction time and the total running time drop drastically. Figure 7 shows that the average processing time for each image increases when T2 or n increases. Moreover, these quantities for Case A are slightly higher than those of Case B with the same value of T2.

Fig. 7
figure 7

Average processing time per image of the proposed method for different cases

As can be seen from Table 3, the proposed method performs better than state-of-the-art methods [13, 18] in terms of not only the running time but also the detection accuracy. The feature vector dimensions of He et al. [13] are 2250, 4410 and 7290 for different parameters, therefore, they used SVM recursive feature elimination to reduce the number of features to 100. We compare the processing time per image, which is sum of feature extraction time and feature selection time with previous works. For T2 = 4; n = 3, we observe the fastest run of the proposed method in different parameters setups where the running time is approximately 60% and the detection accuracy is much higher in comparison with [13, 18]. In addition, the detection accuracy of the proposed method for T2 = 5; n = 5 is from about 4.5% to 7% higher than that of those methods with shorter running time.

Table 3 Comparison of average running time per image with state-of-the-art methods

5 Conclusions and Discussion

In this paper, a novel Markov features based method in DCT domain is introduced for detecting spliced images. The coefficient-wise Markov features contribute considerably to the detection accuracy of the proposed method. Further, the newly proposed sub-block construction algorithm of extracting correlation between neighboring blocks can not only increase the detection accuracy notably but it can also decrease the computational complexity of the whole splicing detection process. Experimental results demonstrate that the proposed method succeeds in reducing the total running time and improving detection accuracy in comparison with state-of-the-art methods using the same data testing set. Despite the fact that spliced images can be detected with high precision, one drawback of the proposed method is that spliced regions are not specified.

In future works, we plan to solve the optimization problem between detection accuracy and computational complexity to improve the performance of splicing detection [11, 19,20,21] and to apply CNNs for image splicing localization [1,2,3, 11, 19,20,21].