1 Introduction

Nowadays, digital images are fairly prevalent on the Internet. However, among these images, there are often forged ones, which might result in harming individual and public interests, such as, injuring someone’s reputation, and misleading public opinion. Therefore, as an effective technique to unveil the forgeries in digital images, image forensics has drawn increasing attention over the past more than ten years [11, 30, 33]. So far, extensive studies have been conducted in the field of image forensics, including the detection of median filtering [4, 23, 37], the estimation of JPEG compression history [9, 14, 38], the identification of source camera [18, 26], the localization of tampered regions in forgery images [7, 25, 31], and so on.

This paper focuses on the study of median filtering detection. Median filtering is an order statistic operation. Because of its capability of removing image noise meanwhile preserving image detail [17], median filtering has been widely studied and applied in image denoising. In recent years, much attention has also been paid to the detection of median filtering. This is mainly due to the following reasons. First, one goal of image forensics is to detect the use of image editing and retouching in digital photos [21, 33]. Since median filtering is one of the basic tools for image editing and retouching, there is a need to detect median filtering applied to images. Second, as a nonlinear image operation, median filtering can effectively conceal the artifacts caused by some other operations, but without much further deterioration of image quality. Therefore, median filtering has been employed as a tool to attack image forensics, such as to hide the trace of image resampling [22] or image compression [35]. In addition, it could be used to degrade the performance of steganalysis methods [19, 23]. Median filtering detection is a simple but effective way to counter these anti-forensics and anti-steganalysis. Third, in the scenario of image tampering, it may happen that a portion of an image is pasted on another image to create a tampered image. If the two original images are filtered in different ways, i.e., one is smoothed with median filter, while the other is not, then the tampered region can be localized by a block-by-block detection of median filtering in the tampered image, as done in [4, 19, 37]. That is to say, median filtering detection is also useful to expose image tampering.

Several studies have been conducted on the detection of median filtering. In [23], Kirchner and Fridrich proposed a feature to detect median filtering by measuring streaking artifactsFootnote 1 in digital images. They found that the number of pixels whose first-order differences are zero, denoted as h 0, is likely to increase relative to that of pixels whose first-order differences are one, denoted as h 1. Hence, they introduced the ratio h 0/h 1 as a detection feature. Experimental results have shown that the streaking-based feature is effective for uncompressed images, but not robust to JPEG compression. In order to reliably detect median filtering in JPEG compressed images, the authors of [23] also proposed to apply SPAM, a set of steganalysis features proposed in [29], for median filtering detection. It is shown that the application of SPAM can achieve better performance in detecting median filtering in JPEG compressed images. Almost simultaneous to the work reported in [23], Cao et al. [5] independently proposed a detection feature similar to the above streaking-based feature. A year later, Yuan [37] made several interesting observations on median filtered images. These are, firstly, the pixels in a small block tend to have the same value as the median of the block; secondly, the gray value of the center pixel of a block occurs more frequently in the block than before filtering; and thirdly, median filtering makes the quantity of gray values in a small block decrease. Based on these observations, Yuan proposed a set of features, named as median filtering forensics features (MFF), including the distribution of the block median, the occurrence of the block-center gray value, the quantity of gray levels in a block, and so on. Experiments have shown that MFF can improve the performance of detecting median filtering in low-quality images. Recently, Chen et al. [4] proposed two sets of new features extracted from the difference domain of images. The first set of features, namely global probability feature set, was derived from the cumulative histograms of the first- and second-order difference images; and the second set of features, namely local correlation feature set, was computed based on the correlation between adjacent pixels in the difference images. They combined these two sets of features as the final global and local feature set (GLF). According to the experiments reported in [4], GLF can achieve good performance in detecting median filtering in low resolution and highly compressed images. Later, Kang et al. [19] found that median-filtered residuals (MFR) have a higher capability of suppressing both image textures and block artifacts than difference images. Then, they made a contribution to median filtering detection by extracting features from MFR. This is quite different from the earlier works, which usually extract features from the difference domain of images. Another contribution of [19] is to apply one-dimensional autoregressive (1D-AR) model to describe the difference of MFRs between median-filtered and non-filtered images. More recently, Zhang et al. [39] proposed to use high-order local ternary pattern (LTP), which is an improved local binary pattern, for median filtering detection. Their results have shown that LTP is more robust to JPEG compression than SPAM and MFF. Lai et al. [24] proposed to detect the presence of median filtering based on the local binary patterns in both the spatial and DCT domains. Qiu et al. [32] reported that spatial rich model (SRM) [13], originally proposed for universal steganalysis, can effectively detect median filtering when the test images have a large size (in their tests, the size of the images varies from 480 × 640 to 4288 × 4752) and are not compressed. However, the dimension of SRM features is too high. Different from the above-mentioned methods that use handcrafted features for detection, Chen et al. [3] are the first to introduce convolutional neural network (CNN) to the field of median filtering detection. Their proposed CNN-based method can learn discriminative features automatically from image samples, and outperforms the MFF [37], GLF [4], and 1D-AR [19] methods in detecting median filtering in 32 × 32 and 64 × 64 images. From an interesting point of view, Pasquini et al. [28] performed a theoretical analysis on the deterministic property of median filter, and then proposed a novel approach to detect the presence of median filtering in 1D signal, which can guarantee 0% false negative in theory. It is also worth to note that a few studies [6, 10, 12, 36, 40] have been done on anti-forensics and countering anti-forensics of median filtering.

In this paper, inspired by the work of Kang et al. [19], we propose a novel method for median filtering detection. We regard not only the median filtered residual (MFR) but also the average filtered residual (AFR) and Gaussian filtered residual (GFR) as forensic fingerprints. Considering that each of the filtered residuals should be a two-dimensional stationary random process, we propose to use 2D-AR model instead of 1D-AR model to characterize the three kinds of the filtered residuals separately. The AR coefficients of the three filtered residuals of an image are then used together as features to distinguish between median filtered and non-median filtered images. Our experimental results indicate that the proposed method outperforms existing methods in detecting median filtering in JPEG compressed images.

The rest of the paper is organized as follows. In Section 2, we present the proposed detection method in detail. Database and experimental methodology are described in Section 3. Then, in Section 4, experimental results are reported and discussed. Finally, conclusions are summarized in Section 5.

2 Proposed detection method

The proposed detection method is a learning-based method. It can be divided into two parts: the feature extraction, and the classifier training and testing. The second part is very similar to that applied in previous works. Therefore, in this section we focus on the description of the feature extraction. Before describing it, we first briefly introduce the median filtering applied to images.

2.1 Median filtering

Median filtering is an order statistic operation, which is implemented by replacing the gray value of each pixel in an image with the median of its neighboring pixels. Given an image I to be filtered, the process of median filtering can be formulated as

$$\begin{array}{@{}rcl@{}} \hat{\mathbf{I}}(i,j) &=& \text{median }\{\mathbf{I}(i+r,j+s), \; (r,s)\in W \},\\ && \text{for}\quad (i,j)\in(1,2,\ldots,M)\times(1,2,\ldots,N), \end{array} $$
(1)

where I(i,j) denotes the pixel value of the image I at coordinate (i,j), W denotes the set of coordinates within a square window centered at (0,0), and M and N denote the image height and width, respectively. Since 3 × 3 and 5 × 5 windows are the most widely used for median filtering, this paper concentrates on the detection of 3 × 3 and 5 × 5 median filterings, as done in previous works [4, 19, 23, 37].

2.2 Feature extraction

In [19], Kang et al. proposed to extract features only from the median filtered residual (MFR). More generally, the proposed method extracts features from multiple filtered residuals, which are defined as

$$ \begin{array}{rcl} \mathbf{R}_{t}(i,j) = \mathbf{I}(i,j)-\hat{\mathbf{I}}_{t}(i,j), \end{array} $$
(2)

where I and \(\hat {\mathbf {I}}_{t}\) denote the original and filtered images, respectively, and the subscript t denotes the index of the filtered image. As a specific example, in this paper we let t ∈ {“m”, “a”, “g”}, which represent median, average and Gaussian filterings, respectively. The window size of 3 × 3 can achieve a low computational complexity, and was adopted in [19]. For simplicity and a fair comparison with [19], in this paper the window size of the three kinds of filterings is also set as 3 × 3, and the mean and variance of the Gaussian filtering are set as 0 and 0.5, respectively. Therefore, as described above, not only the MFR used in [19], but also the average filtered residual (AFR) and the Gaussian filtered residual (GFR) are applied as the forensic fingerprints of median filtering. Note that compared with the first- and second-order differences of images, the filtered residuals have better capability of removing the interference caused by image edges and textures. Besides, the three kinds of filtered residuals are different from each other, thus it is possible to extract different distinguishing features from different kinds of filtered residuals.

In [19], 1D-AR model was utilized to fit the statistical properties of MFR in the horizontal and vertical directions. However, a digital image is a 2D signal, and median filtering is usually performed by sliding a square filtering window over the entire image. This results in that each pixel in a filtered image has strong correlations with its neighboring pixels in all directions. Therefore, 1D-AR model is not good enough to fit the statistical properties of MFR. In this paper, we consider each of the filtered residuals as a 2D stationary random process. Based on this, we propose to apply 2D-AR model to characterize the filtered residuals, and to extract the coefficients of the 2D-AR model as our features.

The 2D-AR model A R(p,q) of each filtered residual R t (t ∈ {“m”, “a”, “g”}) can be given by

$$ \begin{array}{ll} \mathbf{R}_{t}(i,j) = & \displaystyle \underset{k+l\neq0}{\sum\limits_{k=0}^{p}\sum\limits_{l=0}^{q}} \mathbf{a}_{t}(k,l)\mathbf{R}_{t}(i-k,j-l)+\boldsymbol{\varepsilon}_{t}(i,j), \\ & \text{for}\quad (i,j)\in(1,2,\ldots,M)\times(1,2,\ldots,N), \end{array} $$
(3)

where a t and ε t denote the AR coefficient and prediction error matrices of R t , respectively, p and q (\(p, q\in \mathbb {N}^{+}\)) denote the orders of the 2D-AR model in the row and column directions, respectively, (k,l) indexes the elements of the matrix a t of size (p + 1) × (q + 1), and a t (0,0) is specified as 1. Note that owing to the row and column symmetry of 3 × 3 and 5 × 5 median filterings, in this paper we use the AR model A R(p,q) with p = q to characterize the filtered residuals.

It is difficult to estimate the AR coefficient matrix a t directly from (3). But fortunately, a t can be obtained via the correlation method introduced by Kashyap [20]. According to the correlation method, the estimation of a t is reduced to solving the following linear equation sets

$$ \begin{array}{ll} \boldsymbol{\rho}_{t}(m,n)= & \displaystyle \underset{k+l\neq0}{\sum\limits_{k=0}^{p}\sum\limits_{l=0}^{q}} \mathbf{a}_{t}(k,l)\boldsymbol{\rho}_{t}(m-k,n-l), \\ & \text{for}\quad (m,n)\in(0,1,\ldots,p)\times(0,1,\ldots,q),\; m+n\neq0, \end{array} $$
(4)

where ρ t denotes the autocorrelation matrix of each filtered residual, which is calculated by

$$ \begin{array}{rcl} \boldsymbol{\rho}_{t}(m,n) = \left\{ \begin{array}{ll} \frac{\displaystyle\sum\limits_{i=1}^{M-m}\sum\limits_{j=1}^{N-n}\mathbf{R}_{t}(i,j)\mathbf{R}_{t}(i+m,j+n)} {\displaystyle \left( M-m \right)\times \left( N-n\right)} & \text{if} \quad m \ge 0,n\ge 0,\\ &\\ \frac{\displaystyle\sum\limits_{i=1}^{M-m}\sum\limits_{j=1-n}^{N}\mathbf{R}_{t}(i,j)\mathbf{R}_{t}(i+m,j+n)} {\displaystyle \left( M-m \right)\times \left( N-n\right)} & \text{if} \quad m > 0,n< 0,\\ \end{array} \right. \end{array} $$
(5)

and ρ t (−m,−n) = ρ t (m,n).

Next, we investigate the statistical difference of the AR coefficients a t (k,l) among MFR, AFR and GFR. Figure 1 shows the element values (except for \(\bar {\mathbf {a}}_{t}(0,0)\)) of the average matrices \(\bar {\mathbf {a}}_{t}\) of MFR, AFR and GFR. Each \(\bar {\mathbf {a}}_{t}\) was averaged over the matrices a t of 6000 images, and in Fig. 1 was rearranged into a vector by scanning it in zigzag order. Note that considering the trade-off between feature dimension and discriminability, this paper uses the A R(p,q) model with p = q = 6 to calculate the AR coefficents a t (k,l). It is shown in Fig. 1 that the average AR coefficients \(\bar {\mathbf {a}}_{t}(k,l)\) among MFR, AFR and GFR are obviously different, whether for the original non-filtered images or for the median filtered images. Moreover, by comparing the plots in Fig. 1a with those in Fig. 1b and c, we can see that not only for MFR but also for AFR and GFR, the average AR coefficients of the non-filtered images take quite different values from those of the median filtered images. The above findings indicate that the AR coefficients have different statistical characteristics among the three different kinds of filtered residuals, and the AR coefficients of any kind of the filtered residuals have the ability to distinguish the median filtered from the non-filtered images. Hence, it is reasonable to combine the three sets of the AR coefficients extracted from MFR, AFR and GFR as distinguishing features. Figure 1 also shows that the AR coefficient values of both the median filtered and non-filtered images almost vanish when their indices are larger than about 30. This means the AR coefficients a t (k,l) have almost no distinguishability when k + l > 6. Therefore, only the AR coefficients a t (k,l) with k + l ≤ 6 and k + l ≠ 0 (i.e., except for a t (0,0)) are chosen as features.

Fig. 1
figure 1

Comparison of the average AR coefficients \(\bar {\mathbf {a}}_{t}(k,l)\) among MFR, AFR and GFR. The filtered residuals are calculated from a original non-filtered images, b 3 × 3 median filtered images, and c 5 × 5 median filtered images. The model A R(6,6) was used to calculate the AR coefficients, and each \(\bar {\mathbf {a}}_{t}\) was averaged over 6000 images and sorted in zig-zag order

The 2D-AR coefficients of the MFR a m (k,l), for k + l ≤ 6 and k + l ≠ 0, can be divided into three parts: the horizontal coefficients, the vertical coefficients and the rest coefficients (the fifteen coefficients labeled in blue-filled circles as shown in Fig. 2). The horizontal and vertical 2D-AR coefficients of the MFR are closely correlated to the 1D-AR coefficients of the MFR [22], whereas the rest 2D-AR coefficients of the MFR are the additional coefficients that 1D-AR cannot model. In order to validate the effectiveness of these additional 2D-AR coefficients, we visualize them by projecting them onto a 2D plane with linear discriminant analysis (LDA), as shown in Fig. 3. It is seen that the additional 2D-AR coefficients of the original non-filtered images, the 3 × 3 median filtered images, and the 5 × 5 median filtered images are clearly distinguishable from each other, indicating that the 2D-AR model is more suitable to describe the statistical properties of the filtered residuals than the 1D-AR model of [19].

Fig. 2
figure 2

Coordinates of the AR coefficients a m(k,l) with k + l ≤ 6 and k l ≠ 0 (marked by blue-filled circles)

Fig. 3
figure 3

Visualization of the AR coefficients a m(k,l) with k + l ≤ 6 and k l ≠ 0. The blue, red, and green points represent the AR coefficients of the original non-filtered, the 3 × 3 median filtered, and the 5 × 5 median filtered images, respectively. Here, the fifteen AR coefficients as labeled in Fig. 2 were projected onto the 2D plane with LDA, and each of the blue, red, and green point sets consists of 6000 sample points

2.3 Summary of the proposed method

Based on the above description, we summarize the proposed method as follows.

  1. 1.

    Generate the MFR, AFR and GFR for each input image according to (2).

  2. 2.

    Calculate the three AR coefficient matrixes a t (t ∈ {“m”, “a”, “g”}) for the MFR, AFR and GFR, respectively, according to (4) and (5).

  3. 3.

    For each of the three AR coefficient matrices a t (t ∈ {“m”, “a”, “g”}), choose the elements a t (k,l) with k + l ≤ 6 and k + l ≠ 0 as a set of features. Then the three sets of features are combined into the proposed feature set.

  4. 4.

    Train SVM classifier with the proposed feature sets extracted from median filtered and non-median filtered images. Finally, the trained classifier is used for median filtering detection.

Note that, as mentioned above, the order of the A R(p,q) model is set as (p,q) = (6,6) in this paper, thus there are 27 features extracted from each a t , and totally 81 features in the proposed feature set.

The time complexity of the proposed method can be considered from two aspects, namely the feature extraction and the classifier training. The feature extraction of the proposed method consists of two parts: the generation of the filtered residuals and the computation of 2D-AR coefficients. The time complexities of the two parts are dominated by image filtering and autocorrelation computation, respectively. Both image filtering and autocorrelation computation can be implemented to run efficiently. Therefore, the feature extraction of the proposed method is not very time-consuming. It is known that the time complexity of SVM classifier training is related to the dimension of features. The dimension of the proposed features is not large compared with existing features. Therefore, the classifier training of the proposed method is also not time-consuming.

3 Database and experimental methodology

In order to evaluate the performance of the proposed method, we build a database of 6000 images. These images are taken from three different image databases, namely NRCS [27], BOWS2 [15] and BOSS [1], each of which provides 2000 images. Each image in our database is centrally cropped to a size of 512 × 512 pixels, and then converted to an 8-bit grayscale image. 3 × 3 and 5 × 5 median filterings are performed on each of the cropped grayscale images to generate 2 × 6000 median-filtered images. For testing the robustness of the proposed method, the filtered and non-filtered images are further central-cropped into sizes of 256 × 256, 128 × 128, and 64 × 64, and all of the cropped grayscale images are then compressed with quality factors (QFs) 90, 70, 50, and 30, respectively.

Five existing methods, including SPAM [23], MFF [37], GLF [4], 1D-AR [19], and LTP [39], are implemented for comparison. All these compared methods applied the SVM classifier for training and detection. In order to fairly compare our proposed feature set with the above five ones, the SVM classifier is also chosen in this paper, as done in [4, 19, 23, 37, 39]. 40% of the images are used for training, and the remaining 60% are used for testing. A five-fold cross validation is performed in the training to search the two hyper-parameters of the SVM classifier over the following grid:

$$ \begin{array}{rcl} (\textit{C},\gamma)\in\left\{(2^{i},2^{j})|i\in\{-5,-4,\cdots,4\},j\in\{-13,-12,\cdots,13\}\right\}. \end{array} $$
(6)

Note that the feature dimensions of SPAM, MFF, GLF, 1D-AR, LTP, and the proposed method are 686, 44, 56, 10, 2048 and 81, respectively. The time consumption of SVM classifier training is related to feature dimension. 1D-AR has only 10 features, which makes the training of its SVM classifier very fast. MFF, GLF, and the proposed method have less than 100 features, and the trainings of their SVM classifiers can be performed in an acceptable time. Compared to the other four methods, SPAM and LTP have a large number of features, so their classifier trainings are quite time-consuming.

The performance of each classifier is mainly evaluated by the minimum average decision error under equal probability of the positive and negative samples (i.e., the median-filtered and non-median filtered images), which is expressed by

$$ P_{e}=\min\left( \frac{P_{fp}+1-P_{tp}}{2}\right), $$
(7)

where P f p and P t p denote the false positive and true positive rates, respectively. Besides, in some cases we also plot receiver operating characteristic (ROC) curves to illustrate the tradeoff between the false positive and true positive rates.

4 Experimental results

Extensive experiments have been carried out to investigate the detection performance of the proposed method on the above-described database.

4.1 Performance evaluation of different sets of the proposed features

As described in Section II, the proposed 2D-AR features consist of three sets of features extracted from MFR, AFR and GFR, respectively. In this subsection, we present experiments to evaluate the performance of each of the three sets of features, as well as the superiority of the combination of the three sets over either set alone. The experiments are conducted on the images with various image sizes and different QFs of JPEG compression. Table 1 lists the detection results obtained by using the three sets of features from MFR, AFR, and GFR (denoted as 2D-AR-MFR, 2D-AR-AFR, and 2D-AR-GFR, respectively) and the combination of the three feature sets (denoted as 2D-AR-Comb). The detection results of the 1D-AR features [19] are also reported for comparison. In Table 1 and the following tables, “MF3 (or MF5) vs ORI” denotes that the positive and negative samples used for the tests are the 3 × 3 (or 5 × 5) median-filtered images and the original non-filtered images, respectively, and “UnC” denotes that the test images are not compressed. It can be seen from Table 1 that all of the three sets of the proposed features, i.e., 2D-AR-MFR, 2D-AR-AFR, and 2D-AR-GFR, have the capability to detect median filtering, especially when the test images have a large size and are not (or slightly) compressed. Moreover, 2D-AR-Comb is superior to any of the three sets of the proposed features in all the tests. This indicates that not only the features extracted from MFR, but also those from AFR and GFR contribute to the superior performance of the proposed feature set. It is also shown in Table 1 that 2D-AR-MFR outperforms 1D-AR in all the tests, which validates that 2D-AR model is more suitable than 1D-AR model to fit the statistical property of the filtered residuals.

Table 1 Detection results of the three sets and the whole of the proposed features

4.2 Performance comparison with existing methods

A set of experiments have been performed to compare the proposed method with existing methods, including SPAM [23], MFF [37], GLF [4], 1D-AR [19], and LTP [39]. The comparative results are given in Table 2. We can see that the six compared methods generally perform better in “MF5 vs ORI” than in “MF3 vs ORI”. This is not surprising since 5 × 5 median filtering leaves more significant filtering artifacts and thus is more easily detected than 3 × 3 median filtering. We can also see that, in general, the performance of these detection methods becomes worse as the image size decreases. The same happens as the QF of the JPEG compression goes down. These are consistent with the following two facts: first, the smaller the image size is, the less the information can be used for feature extraction; and second, the compression can disturb the artifacts left by the median filtering, and the more severe the compression is, the larger the disturbance becomes. It is also shown in Table 2 that for the uncompressed images, all of the six compared methods perform quite well in the median filtering detections. LTP [39] and SPAM [23] are clearly superior to the other four compared methods, but have much larger dimensions of features, especially the dimension of the LTP features is more than 2000. The proposed 2D-AR-Comb method has a detection error of less than 1.00% when the size of the test images is larger than 64 × 64. Whereas, for the compressed images, most of the compared methods no longer perform very well, particularly when the image size is small and “MF3 vs ORI” is tested. In contrast to the other compared methods, the proposed method is relatively robust to JPEG compression. 2D-AR-Comb achieves the best performance among the six methods in all cases that the QF of the compression is not larger than 70, and also in some cases when the QF takes 90. The superiority of 2D-AR-Comb over the other five methods is evident when the test images are heavily compressed. For example, when the image size and the QF are 512 × 512 and 50, respectively, and “MF3 vs ORI” is tested, 2D-AR-Comb has a detection error of 3.86%, which is 37.5% lower than that of 1D-AR (i.e., the best result of the other five methods). The ROC curves of some representative tests are also provided in Fig. 4. It can be seen that these results represented by the ROC curves are consistent with those reported in Table 2.

Fig. 4
figure 4

ROC curves of the proposed and five existing methods for distinguishing the median filtered images from the non-filtered images. a MF3 vs ORI, and QF = “UnC”; b MF5 vs ORI, and QF = “UnC”; c MF3 vs ORI, and QF = 70; d MF5 vs ORI, and QF = 70; e MF3 vs ORI, and QF = 30; f MF5 vs ORI, and QF = 30. Here, all of the test images have a size of 512 × 512

Table 2 Comparison between the proposed and five existing methods in distinguishing the median filtered images from the original non-filtered images

4.3 Discrimination between median filtering and other manipulations

For median filtering detection, a good classifier should have the ability to distinguish the median filtered images not only from original images but also from those processed by some other image manipulations, such as, Gaussian filtering, average filtering, and image resizing. In order to assess this ability of the proposed method, a set of experiments have also been conducted, including “MF vs AVE”, “MF vs GAU”, “MF vs D-SCA” and “MF vs U-SCA”. Note that “MF vs AVE (or GAU, D-SCA, and U-SCA)” denotes that the positive and negative samples are the median filtered images and those processed by an average filtering with a window size of 3 × 3 (or a Gaussian filtering with a standard deviation of 0.5, a downscaling with a factor of 0.9, and an upscaling with a factor of 1.1). In practice, we may not know the kind of manipulation to be distinguished from median filtering, as discussed in [19]. To simulate a more practical scenario, we created a set of negative samples, denoted as “ALL”, which consists of equal amounts of the original images, the Gaussian filtered images, the average filtered images, the downscaled images, and the upscaled images. Then the experiments of “MF vs ALL” were also conducted. For brevity, we only report the results for the case that the test images are compressed with a QF value of 70. It is interesting to note from Table 3 that GLF [4] performs quite well (it obtains the second best performance in general) in 3 × 3 median filtering detection, but has no superiority over most of the compared methods in 5 × 5 median filtering detection. Note that here 3 × 3 (or 5 × 5) median filtering detection refers to the classification between 3 × 3 (or 5 × 5) median filtering and other manipulations including Gaussian filtering, average filtering and image resizing. In contrast to GLF, the proposed 2D-AR-Comb method always achieves the best results among the six compared methods in all the tests. Moreover, the improvement of the proposed method is often significant when compared with SPAM [23], MFF [37], 1D-AR [19], and LTP [39] in detecting 3 × 3 median filtering. The good performance of the proposed method can also be seen from Fig. 5, where the ROC curves of some representative tests are reported.

Fig. 5
figure 5

ROC curves of the proposed and five existing methods for distinguishing between median filtering and other manipulations. a MF3 vs AVE; b MF3 vs GAU; c MF3 vs D-SCA; d MF3 vs U-SCA; e MF3 vs ALL. Here, all of the test images have a size of 512 × 512, and were compressed with QF = 70

Table 3 Comparison between the proposed and five existing methods in distinguishing between median filtering and other manipulations

4.4 Discrimination between 3 × 3 and 5 × 5 Median filterings

To identify the parameter of an image manipulation is another important issue for image forensics. Hence, we have conducted a set of experiments to examine the ability of the proposed method in identifying the window size of median filtering (i.e., in distinguishing between 3 × 3 and 5 × 5 median filterings). In this set of experiments, the positive and negative samples are the 3 × 3 and 5 × 5 median filtered images, respectively, and all of the filtered images are compressed with QF = 70,50. The detection results are presented in Table 4 and Fig. 6. It is shown that the proposed method performs very well in these tests, and still achieves the lowest detection error among the six compared methods.

Fig. 6
figure 6

ROC curves of the proposed and five existing methods for distinguishing between 3 × 3 and 5 × 5 median filterings. a MF3 vs MF5, and QF = 70; b MF3 vs MF5, and QF = 50. Here, all of the test images have a size of 512 × 512, and were compressed with QF = 70

Table 4 Comparison between the proposed and five existing methods in distinguishing between 3 × 3 and 5 × 5 median filterings

4.5 Performance comparison on more image databases

We have also conducted a set of experiments to compare the proposed method with the existing methods on three more image databases, namely, UCID [34], RAISE2k [8], and DRESDEN [16]. The UCID database has 1338 images with size of 384 × 512; the RAISE2k database has 2000 images with size of 4928 × 3264 or 4288 × 2848; and the DRESDEN database has 1388 images with size of 3872 × 2592 or 3008 × 2000. We centrally cropped each images of these databases into the size of 256 × 256. Then, the cropped images were used as the test images. The detection error rates of the six compared methods are listed in Table 5. It is shown that the compared methods perform better on the UCID database than on the RAISE2k and the DRESDEN databases. This may be explained as follows. The images from UCID database have a much smaller size than those from the RAISE2k and the DRESDEN databases, which results in that the 3 × 3 (or 5 × 5) local neighbourhoods of the UCID images usually appear less smooth than those of the RAISE2k and the DRESDEN images. Therefore, after median filtering, the filtering artifacts left on UCID images are usually more obvious than those on the RAISE2k and the DRESDEN images, hence the median filtering is more easily detected in the UCID images than in the RAISE2k and the DRESDEN images. In addition, Table 5 also shows that the proposed method still outperforms the other compared methods in most of the test cases.

Table 5 Performance comparison between the proposed and five existing methods on UCID, RAISE2k, and DRESDEN databases

4.6 Image forgery detection via median filtering detection

Finally, we present an experiment to illustrate the application of median filtering detection in image forgery detection. One scenario of image forgery can be described as follows: Given two images, one of which is a median filtered image and the other is a non-filtered image, as shown in Fig. 7a and b, respectively. A forged image is generated by pasting a region cut from the median filtered image into the non-filtered image and then compressing it with QF=70, as shown in Fig. 7c. To identify the cut-and-paste region in Fig. 7c, we first divide the forged image in Fig. 7c into blocks of size 64 × 64. Then, the SVM classifiers that have been trained in Subsection IV-B are used to make binary decision on whether each of the blocks is a median filtered block or not. According to the outputs of the classifiers for all of the blocks, we generate six binary maps for the six compared methods, respectively, as shown in Fig. 8. For reference, the ground truth of the forged region is already given in Fig. 7d. The miss detection rates (i.e., the percentage of wrongly classifying median filtered blocks as non-median filtered ones) of SPAM, MFF, GLF, 1D-AR, LTP, and the proposed method are 26.92%,14.93%,31.42%,24.53%,25.25%, and 22.81%, respectively. The false alarm rates (i.e., the percentage of wrongly classifying non-median filtered blocks as median filtered ones) of the six methods are 14.07%,17.29%,8.12%,8.72%,9.92%, and 6.95%, respectively. On the whole, the error rates of the six methods are 20.49%,16.11%,19.77%,16.63%,17.58%, and 14.88%, respectively. It is shown that among the six compared methods, the proposed method has the second lowest miss detection rate and the lowest false alarm rate, which leads to the lowest error rate in detecting the image forgery. Note that the detection errors mainly occur at the top left corner of the forged image for all the six compared methods. The reason might be that the top left corner of the forged image is a low-contrast region so that the statistical traces left by median filtering are too weak to be detected.

Fig. 7
figure 7

Original and forged images. a First original image (3 × 3 median-filtered); b Second original image (non-filtered); c Forged image; d Ground truth of the forgery. Here, all of the original and forged images has a size of 1800 × 2700

Fig. 8
figure 8

Forgery detection results of the proposed and five existing methods. a SPAM [23]; b MFF [37]; c GLF [4]; d 1D-AR [19]; e LTP [39]; f Proposed method

5 Conclusion

Inspired by the work of Kang et al. [19], in this paper we have proposed a novel method for median filtering detection. The main contributions of this paper are as follows:

First, considering that the filtered residuals are two-dimensional stationary random processes, we have proposed to apply 2D autoregressive model instead of 1D-AR model to characterize the filtered residuals. According to our experiments, 2D-AR model is more effective than 1D-AR model to fit the filtered residuals.

Second, we found that, in addition to median filtered residual (MFR), average filtered residual (AFR) and Gaussian filtered residual (GFR) also exhibit different characteristics between median filtered images and non-median filtered images. Hence, we use not only MFR but also AFR and GFR as fingerprints for median filtering detection. Our experiments have validated that the use of more kinds of filtered residuals can evidently improve the detection performance.

Extensive experiments have been conducted to compare the proposed method with five existing methods. The results show that the proposed method performs quite well in all the tests. It is often superior to the existing methods in detecting median filtering in heavily compressed images. Although only grayscale natural images have been tested in out experiments, other types of images, such as medical images, should be modified in a similar way after median filtering, and hence leave similar statistical traces. With appropriate modification, the proposed method should be applicable to other types of images. Our future work includes investigating the discriminability of other residuals to detect median filtering, and countering the anti-forensics of median filtering detection. Moreover, it is feasible to combine the proposed feature set with other existing feature sets, such as the 1D-AR feature set [19], to achieve improved performance, which could also be an interesting future work.