1 Introduction

Image contents are considered as an authentic representation of events. Plenty of images are encountered in our day-to-day life as more and more hand-held devices are equipped with image capturing and image editing tools in addition to traditional digital still cameras. This substantiated with the handiness of a lot of image processing software makes it easier to manipulate images. This has attracted the attention of researchers and has led to the development of a lot of image forensic techniques that reveal image alterations.

Digital crimes involving forged images containing human facial regions are increasing recently. Once an image is proved as a forged image, the next step is to locate the forged facial region. The techniques that locate forged regions are termed as forgery localization techniques. For example, in a forged group photo, the task is to locate the spliced facial region among various facial regions in the image. Usually, forgery localization techniques detect the presence of irregularities in forged regions. Whenever an image content is altered by adding/modifying an image region, it changes the original patterns of scene information such as the color of scene illumination, thus creating an inconsistent region.

In this work, we analyze the inconsistency in the scene illumination across different image regions for detecting forged region. For this, we rely upon the scene illumination representation proposed by Riess and Angelopoulou [33]. The color of scene illumination is recorded in the pixels and if the scene were illuminated by multiple light sources, then different image regions will exhibit different illuminant color, introducing a pattern. The properties of this pattern will be different at a spliced (copy-pasted) image region when compared to the untouched regions within the image. Since the surface reflectance properties depend on the material of the object, only similar object materials can be compared for checking the inconsistency. Therefore, the illumination pattern and color, in the facial skin regions are analyzed to reveal the spliced facial region.

Forgery detection and localization in spliced images by exploiting inconsistencies in illumination is addressed earlier in [6, 8, 11, 13, 15, 16, 27, 33, 46, 47, 51, 52]. Gholap and Bora developed a technique based on the difference in the illuminant color observed from different image regions [16]. Here, an image is declared as authentic if different image regions report the same illuminant color, otherwise declared as forged. Cao et al. developed a forgery detection method that considers color histograms and illuminant color differences estimated from foreground and background image regions [6]. Wu and Fang proposed another forgery localization method [51], where the image is divided into different overlapping blocks and one of the blocks is selected as the reference block. The image block is declared as spliced if the illuminant color difference between the corresponding block and the reference block is greater than a threshold.

Fan et al. devised a forgery localization technique that automatically selects reference illuminant color [13]. Here, the image is divided into different vertical and horizontal regions, a region is declared as spliced if the difference between the estimated illuminant color and the reference illuminant color is greater than a threshold. Illuminant estimation is carried out using 5 different algorithms, and for each algorithm the inconsistent regions are identified. Finally, the intersection of all inconsistent regions is declared as the spliced region. Vidyadharan and Thampi proposed a forgery localization technique by applying histogram distance measures on brightness distribution obtained from facial skin regions [46].

Among the forensic techniques that consider illumination inconsistency, certain works consider forgery localization in spliced images by analyzing human facial regions [8, 11, 15, 27, 33, 47]. Riess and Angelopoulou, in their pioneering work that forensically analyze illuminant distribution across an image, found that a manual examination of illuminant representation could reveal forged image regions [33]. This finding is further explored by Carvalho et al., by utilizing edge and texture features generated from the facial regions extracted from the illuminant maps [11]. They automated forgery detection by developing a machine learning technique that classifies forged and authentic images based on the discrepancies in the texture and edge features from the illuminant representation. Later, Carvalho et al. improved forgery detection and considered forgery localization as well, by considering color, and shape features, in addition to texture features with the help of a classifier ensemble [8].

Meanwhile, there are certain techniques that followed non-machine learning approaches for forgery detection and localization [15, 27, 47] from images containing spliced human facial regions. Francis et al. developed a forgery localization technique based on differences in illuminant color estimated from the nose tips of different persons in a group photo [15]. Vidyadharan and Thampi proposed a technique where a Principal Component Analysis (PCA) is carried out on facial regions extracted from the illuminant maps for locating spliced facial region [47]. Mazumdar and Bora devised a Dichromatic Plane Histogram (DPH) based technique to detect forged images [27]. The DPH is considered as an illumination signature for a face, and DPHs obtained from facial regions captured at similar illumination will be similar. The similarity between DPHs are examined using correlation measure. If the correlation measure of any of the face pairs in an image is lower than a threshold the image is considered forged.

In the proposed work, we investigate the discriminative power of different color, texture and combined color-texture features in locating spliced facial region from the illumination representation of an image. This evaluation is inspired by Van de Sande’s evaluation of color descriptors for scene recognition [36]. The attempt to consider combined color and texture features is motivated by the work of Khan et al. where a compact texture and color descriptor is used for texture classification [21].

The main contributions of the work are,

  • An evaluation of 5 texture descriptors, 5 color descriptors, 3 descriptors that combined color and texture features, 5 color moments and histogram descriptors, and 5 color-shape descriptors for forgery localization from illuminant maps.

  • Evaluation of the performance of various categories of histogram distance measures.

  • A comparison exhibiting a better detection accuracy for forgery localization based on texture descriptors than existing non-machine learning approaches.

Even though we evaluated the discriminative power of various descriptors on forgery localization, the results of the study can be applied to other image processing domains that consider the similarity of segmented images. Also, the comparison of features to detect similarity in digital data can be used in different applications [34, 41, 53, 54].

The rest of the paper is organized as follows. In Section 2, we discuss the representation of scene illumination and illuminant maps briefly. The texture descriptors considered in the work are briefed in Section 3. The color descriptors used for locating spliced face is discussed in Section 4. The combination features, such as color-texture descriptors are discussed in Section 5. Additional features considered are mentioned in Section 6. The evaluation framework illustrating how various descriptors are generated from the illumination representation and how the descriptor representing the spliced face is located is mentioned in Section 7. The details of experiments including the experimental setup, distance metrics used for comparing the feature descriptors, performance evaluation criteria, and the experimental analysis conducted are detailed in Section 8. Finally, the conclusion of the work is given in Section 9.

2 Representing scene illumination

When an image is captured by the camera sensor, the illumination present in the environment is recorded in the pixels. The scene illumination information present in an authentic unaltered image will be consistent whereas, in a forged image, the scene illumination will be inconsistent at the forged region. For studying the change in the pattern and color of illumination, we used illuminant maps - the scene illumination representation proposed by Riess and Angelopoulo [33].

For generating the illuminant map, an image is first segmented into regions of similar color. Each region is further divided into small patches. From each patch, the color of illumination, termed as illuminant color is estimated. Finally, a majority voted illuminant color is selected as the illuminant color of the region. Since illuminant color is estimated locally at a region this representation is capable of representing the multi-illuminant environment [33]. In [11], Carvalho et al. have used two variants of illuminant maps, such as Inverse Intensity Chromaticity (IIC) Map and Generalized Grey World (GGW) map. In GGW, the illuminant color in an image patch within a region is computed using the statistical approach followed in Generalized Grey Edge Framework proposed by Van de Weijer in [44]. In IIC map, the illuminant color in the image patch is computed using the Inverse Intensity Chromaticity space proposed by Tan et al. in [42] based on the physics-based Dichromatic Reflection Model [14]. Figure 1 shows an example of original unmodified image, a spliced image, corresponding illuminant maps, and facial regions extracted from the respective illuminant maps.

Fig. 1
figure 1

An example of the original image and spliced image, corresponding illuminant maps and extracted facial regions. Both the original and spliced image are taken from tifs-database [11]

In our work, we evaluated the discriminative power of color, texture, and combined color-texture features in IIC and GGW illuminant maps. The illuminant maps show a texture pattern that varies in color, depending on the scene illumination intensity and the direction of light. Thus, the texture properties and color properties at a spliced region may differ from the rest of the untouched regions in the image. Here, we explore how the difference in the texture and color properties among different facial regions within an image can locate forged regions.

3 Texture descriptors

In this work, we evaluate the discriminative power of 5 popular texture descriptors.

Local Binary Pattern (LBP). Ojala et al. proposed a local texture descriptor known as Local Binary Pattern capable of capturing texture patterns [30, 40]. Here, the pixels in a local neighborhood are compared with the central pixel. If the neighboring pixel is greater than the central pixel, that pixel is coded with a one and otherwise a zero. Finally, these binary values are concatenated to get the LBP code. All these LBP codes are counted and the distribution is represented as the LBP histogram (256-bin).

Completed Local Binary Pattern (CLBP). Guo et al. devised a completed LBP descriptor that makes use of the Local Difference Sign-Magnitude Transform (LDSMT) [17]. In traditional LBP, texture pattern around the neighborhood of a pixel is represented by the sign of the difference between current and the central pixel. In complete LBP, both the magnitude of the difference and the central pixel value are also considered. The sign and magnitude CLBP representations, when joined, is represented as CLBP_S/M (59049-bin).

Local Phase Quantization (LPQ). T Ojansivu and Heikkilä [31] proposed blur invariant, local texture descriptor that performed better on non-blurred images as well. The phase of low-frequency components in the Fourier domain are used to represent texture. The phase values of four low-frequency components are decorrelated and quantized to get the LPQ codeword. Finally, the codewords are represented as LPQ histogram (256-bin).

Binarized Statistical Image Features (BSIF). BSIF is a 256-bin dense descriptor where a binary code for a pixel is generated by convolving a neighborhood region of pixels with filters that are learned by prior training by independent component analysis [20]. Training is performed using image patches randomly sampled from a small set of natural images. Thus, the filters capture the statistical properties of natural images.

Binary Gabor Pattern (BGP). Zhang et al. proposed a texture feature using Gabor filters [55]. Here, an image is convolved with even symmetric and odd symmetric Gabor filters at three different resolutions to obtain a 216 bin histogram termed as the Binary Gabor Pattern.

4 Color descriptors

Inconsistencies in illumination are visible as noticeable color changes in the illuminant maps of forged images. Thus, to study the discriminative power of color for forgery localization, we considered various color descriptors, including color name descriptors, color histograms and color moments.

Color Names. Color name descriptors represent colors based on human perception of color as linguistic terms such as ‘Red’ and ‘Blue’. Benavente et al. devised an 11-bin color name descriptor, Automatic Color Names (ACN), based on the parametric model with fuzzy set membership for different colors [2]. Van de Weijer et al. devised an improved version of linguistic color names, termed as Color Names (CN), by learning from real-world images [50].

Discriminative Color Descriptors(DCD). Khan et al. proposed a discriminative color des- criptors that represent the color features with clusters grouped based on their discriminative power in classifying images [22]. The color descriptors with 11, 25, or 50 clusters are available.

In addition to the above color descriptors, we have studied the performance of a few more color histograms and color moment descriptors mentioned in Table 1.

Table 1 Details of color histograms and color moments considered in the evaluation

5 Combined color-texture descriptors

Although color and textures can be represented well as separate descriptors, certain attempts have been made to combine the features for computer vision tasks such as texture classification [21]. Currently, color and texture features are combined in either of two ways described as follows. In the first approach, color and texture features are computed separately and the final descriptors are combined together by concatenating the feature vectors. In the second approach, known as the joint approach, the texture descriptor is computed in different color channels separately and all these texture descriptors are concatenated later [1].

Color LPQ. Pedone and Heikkilä proposed an extension to LPQ that considers color features [32]. Here, the 1280-bin descriptor is computed from a multi-vector representation of color.

Color Texton Descriptors. Alvarez and Vanrell extended the traditional texton theoretic approach considering the color and shape of image blobs [1]. The basic image blob representation is as follows. A perceptual blob is defined as a region with similar color identified from opponent color space. Later, shape attributes and color attributes of blobs are extracted. Shape attributes include width, length, and orientation. Color attributes include the intensity, rg and by components extracted using the median of color information of all pixels belonging to the blob. Thus, there is a total of six attributes. Each attribute value is then quantized in m intervals resulting in 6 x m terms. Finally, the attributes are described by concatenating the probability distribution of all six attributes. This representation is further improved by adding the perceptual relationships between attributes and the co-occurrence of shape and color attributes of blobs. Perceptual relationships between shape attributes are added by transforming the attributes into shape space. The two axes in the shape space represent the width and length and the third axis represents the angle. Three quantization models such as Cartesian, Cylindrical and Circular models are used for shape space representation. The color attributes are represented as HSI-Carron [7] and HSV-Smith spaces [38]. These color and shape attributes can be represented either as combined at the blob level or separate at the image level resulting in the following texton descriptors.

Co-joint Texton Descriptor (JTD): This descriptor represents the color and texture attributes as a joint probability distribution. Thus the co-occurrence of color and shape attributes are taken care of.

Semi-joint Texton Descriptor (STD): This descriptor represents color and texture by concatenating the probability distribution of color and texture.

6 Additional features

For analyzing the differences in underlying pixel statistics, we have used two more features such as edge features and Grey Level Run Length features (GLRLM) [43]. Edge features are represented by Histogram of Oriented Gradients (HOG) proposed by Dalal and Triggs [10]. HOG and GLRLM features are 81-bin dimensional and 44-bin dimensional respectively.

We also considered features based on Scale Invariant Feature Transform (SIFT). The SIFT, proposed by Lowe [25] represents the local features representing regions around identified keypoints within an image. Since keypoint based feature extraction is performed, SIFT represents the spatial information of image regions. Here, different variants of SIFT such as HSV-SIFT [4], Opponent-SIFT [36], C-SIFT [5], rg-SIFT [36] and rgb-SIFT [36] as discussed by van de Sande et al. in [36], are considered. Details are given in Table 2.

Table 2 Details of color-shape descriptors considered for the evaluation

7 Evaluation framework

We used the evaluation framework shown in Fig. 2 for evaluating the forgery localization capability of different feature descriptors. At first, the given image is represented as an illuminant map showing the variation in the illumination pattern across the image. The facial regions are extracted from this illuminant map by specifying the bounding box around each face. Then, the feature descriptor to be evaluated is generated from each face. Finally, the distance between the descriptors are compared among themselves, to identify the descriptor that represents the spliced face. The descriptors representing faces captured at same illuminant environment will be similar compared to the descriptor representing spliced face. Therefore, the distance between descriptors representing authentic faces will be less compared to the descriptor representing spliced face. Accordingly, the spliced face is located. We evaluated different histogram distance measures to identify the most dissimilar feature descriptor. The steps involved in evaluating a feature descriptor with M number of distance measures is given in Algorithm 1.

figure c
Fig. 2
figure 2

The framework illustrating the steps involved in the evaluation process. For clarity, scaled version of feature descriptors are shown. The input image shown is taken from tifs-database [11]

8 Experiments and results

8.1 Datasets

For experimental evaluation we used spliced images from three datasets, such as i) DSO-I ii) SwapMe and iii) FaceSwap. The DSO-I dataset is taken from tifs-databaseFootnote 1 [11]. The tifs-database contains 100 spliced images containing human facial regions, saved in Portable Network Graphics (PNG) with a resolution of 2,048 x 1536 pixels. We used a subset of 55 spliced images that contain more than two facial regions for evaluating the forgery localization capability of various color and texture descriptors.

The SwapMe and FaceSwap datasets contains spliced images created by exchanging a source facial region with a destination facial regions [56]. For our experiments, we selected a subset of 55 spliced images from the Swapme dataset. From the FaceSwap dataset, we selected 33 images that were not present in Swapme. Along with FaceSwap images, we combined our own set of 7 spliced images to create a Combined FaceSwap dataset. All the selected spliced images in both Swapme and Combined FaceSwap contained three or more facial regions.

We used two variants of illuminant maps such as GGW and IIC. In Sections 8.28.13, we present the feature extraction process, performance evaluation criteria, and the evaluation of various descriptors for forgery localization from illuminant maps.

8.2 Feature extraction

For illumination representation, the two variants of illuminant maps such as IIC and GGW maps were generated using the softwareFootnote 2 [33]. From the illuminant map, all the facial regions are extracted. Then, feature descriptors are computed for each facial region. The similarity among the feature vectors is measured using different histogram distance measures.

8.3 Distance measures considered

Feature descriptors extracted from the facial regions in the illuminant maps can be considered as a distribution. As studied in the work of Meshgi and Ishii [28], we compared feature descriptors using various categories of histogram distance measures such as, heuristic distance measures, non-parametric test statistics, information theoretic divergences, and cross-bin distance measures.

From heuristic distance measures, we considered the L2 distance and the Pearson Correlation Coefficient [3]. From the non-parametric test statistics, we used the Kolmogorov-Smirnov distance (KS) [26], Cramer-von Mises Statistics (CM) [12], Chi-square (CS) statistics [23, 39], and Bhattacharya Distance (BD) [19]. The Kullback-Leibler divergence (KL) measure, is based on information theoretic divergences [23]. The Diffusion Distance (DF) [24] and the Earth Mover’s Distance (EMD) consider cross-bin information capable of capturing the perceptual similarity of images [23, 35].

8.4 Performance evaluation criteria

In forgery localization, the objective is to locate forged image region within spliced images. Hence, the performance of a method can be measured by the detection rate of detecting forged facial regions within spliced images. Here, the performance is evaluated using the following performance metrics,

$$ Sensitivity\quad or\quad Recall\quad or\quad TPR = \frac{{Faces}_{Located}}{{Images}_{Spliced}} $$
(1)

where FacesLocated = No. of spliced faces located correctly,

ImagesSpliced = Total no.of spliced faces, and TPR is the True Positive Rate.

$$ Specificity\quad or\quad TNR = \frac{{AuthenticFaces}_{Detected}}{{Faces}_{Authentic}} $$
(2)

where AuthenticFacesDetected = No. of authentic faces detected correctly, and

FacesAuthentic = Total no.of authentic faces.

$$ \scriptsize Accuracy = \frac{TP + TN}{TP+FP+FN+TN} $$
(3)
$$ \scriptsize Precision = \frac{TP}{TP+FP} $$
(4)
$$ \scriptsize F-Score = 2\frac{Precision.Recall}{Precision+Recall} $$
(5)

8.5 Experiment 1: evaluating texture descriptors

LBP and LPQ descriptors are obtained using the software provided by the respective authors Footnote 3. The BSIF Footnote 4 and BGP Footnote 5 features are estimated using the source code provided by the corresponding authors. The CLBP features are generated by using the source code provided by Guo et al. Footnote 6 [17].

First, we considered texture descriptors and the sensitivity obtained from IIC and GGW illuminant maps are shown in Figs. 3 and 4 respectively. It is interesting to note that, the distance measure providing the best sensitivity is different for different texture features. The reason is that the nature of the feature vectors and the discriminative capability of different texture descriptors varies. Among the various distance measures, CS, BH, and KL yielded the best results for LBP and LPQ descriptor in IIC map. For BGP, the distance measures L2, and DF provided the best results. BSIF, and CLBP showed highest sensitivity with the distance measures BH and, KL respectively.

Fig. 3
figure 3

Sensitivity obtained for texture descriptors from IIC maps using different distance measures

Fig. 4
figure 4

Sensitivity obtained for texture descriptors from GGW maps using different distance measures

In GGW map, the distance measures-CS and BH yielded best result for LBP and LPQ (see Fig. 4). BGP showed good performance with CR, DF, EM distance measures and BSIF exhibited highest sensitivity with KL distance measure. For comparing the performance of the texture descriptors in different datasets such as DSO-I, SwapMe, and Combined FaceSwap, we selected the distance measure-BH that provided better sensitivity in both IIC and GGW maps.

The precision, sensitivity, specificity, accuracy and F-Score obtained for texture descriptors using BH distance on IIC and GGW maps are shown in Tables 3 and 4, respectively. Compared to the performance of texture descriptors on IIC maps, the descriptors exhibited better performance on GGW maps on all the three datasets. This means that the GGW maps carried more texture variations capable of discriminating spliced regions from authentic regions.

Table 3 Evaluation of texture descriptors on different datasets using IIC maps
Table 4 Evaluation of texture descriptors on different datasets using GGW maps

8.6 Experiment 2: evaluating color descriptors

For evaluating various color descriptors for forgery localization, we used the software provided by Joost van de Weijer Footnote 7. The sensitivity obtained for various color descriptors on IIC and GGW illuminant maps of spliced images in DSO-I dataset are shown in Figs. 5 and 6 respectively.

Fig. 5
figure 5

Sensitivity obtained for color descriptors from IIC maps using different distance measures

Fig. 6
figure 6

Sensitivity obtained for color descriptors from GGW maps using different distance measures

As in the case of texture descriptors, here also, for both IIC and GGW maps, the highest sensitivity obtained for each color descriptor is different for different distance measures. This indicates that the nature of the feature vector obtained with different descriptors are different. In IIC map, distant measure DF resulted in the best sensitivity with DCD25 and DCD50 descriptors. But, on GGW, the descriptor - Opponent histogram yielded highest sensitivity using KS distance measure. This shows that both IIC and GGW may contain varied color features, that can be captured using different color descriptors. Hence, it would be better if both illuminant maps are considered in forgery localization techniques.

The precision, recall, TNR, accuracy and F-Score obtained for color descriptors on the three datasets using IIC and GGW maps (with DF distance measure) are shown in Tables 5 and 6, respectively. For DSO-I dataset, the IIC maps showed better performance for color descriptors compared to GGW maps. But, for SwapMe and Combined FaceSwap, the variation in the performance of descriptors are less noticeable. This is due to the fact that, DSO-I dataset contains images in uncompressed PNG format, and the IIC and GGW maps showed visible color variations. In addition to this, the clarity of facial regions in DSO-I dataset were better compared to the clarity of facial regions in both SwapMe and Combined FaceSwap.

Table 5 Evaluation of color descriptors on different datasets using IIC maps
Table 6 Evaluation of color descriptors on different datasets using GGW maps

8.7 Experiment 3: evaluating combined color and texture descriptors

In this section, the discriminative power of descriptors that consider both color and texture features, such as Color-LPQ, two variants of color textons - JTD and STD are evaluated. The source code for computing Color LPQ descriptor was provided by Pedone [32]. The JTD and STD descriptors are generated using the source code provided by Alvarez and Vanrell Footnote 8 [1]. The sensitivity obtained for various combined color and texture descriptors on IIC and GGW illuminant maps of spliced images in DSO-I dataset are shown in Figs. 7 and 8 respectively.

Fig. 7
figure 7

Sensitivity obtained for combined color texture descriptors from IIC maps using different distance measures

Fig. 8
figure 8

Sensitivity obtained for combined color texture descriptors from GGW maps using different distance measures

In IIC map, Color LPQ and STD provided highest sensitivity with the distance measure CR as shown in Fig. 7. Similarly, Color LPQ showed highest sensitivity with CR distance measure in GGW maps too. The precision, recall, TNR, and F-Score obtained for combined color-texture descriptors on various datasets using CR measure on both IIC and GGW maps are shown in Tables 7 and 8, respectively.

Table 7 Evaluation of combined color-texture on different datasets using IIC maps
Table 8 Evaluation of combined color-texture on different datasets using GGW maps

8.8 Experiment 4: evaluating color histograms, moments and color shape descriptors

Color histograms, moments and shape descriptors are generated using the source code provided by Koen van de Sande Footnote 9 [37].

Figure 9 shows the sensitivity obtained for color histogram moments and moment descriptors in IIC map. The color moment invariants and rgb histogram yielded the highest sensitivity of 64.58% with CS and KL distance measures. Table 9 shows the evaluation using various performance metrics obtained on different datasets using CS distance measure. Fig. 10 shows the sensitivity evaluation of color-shape descriptors. Among the different variants of color shape descriptors, Hue-SIFT exhibited the highest sensitivity of 54.17% using L2 distance measure. Table 10 shows the evaluation of various performance metrics obtained on DSO-I dataset using L2 distance measure.

Fig. 9
figure 9

Sensitivity obtained for color histogram and moment descriptors from IIC maps using different distance measures.

Table 9 Evaluation of color histogram and moment descriptors on different datasets using IIC maps
Fig. 10
figure 10

Sensitivity obtained for color shape descriptors from IIC maps using different distance measures

Table 10 Evaluation of color shape descriptors on DSO-I dataset using IIC maps

8.9 Experiment 5: evaluating HOG and GLRLM descriptors

GLRLM and HOG features are computed using the Matlab source codes provided by Wei [48] and Ludwig et al. [18] respectively. The highest sensitivity obtained for HOG is 43.64% in IIC map using CR, DF distance measures, and 52.73% in GGW map with distance measure DF. In both, IIC and GGW, the HOG descriptors performed better than GLRLM descriptors. Figures 11 and 12 shows the sensitivity obtained for various distance measures in IIC and GGW maps respectively. The evaluation of HOG and GLRLM descriptors on different datasets using IIC maps and GGW maps is shown in Tables 11 and 12 respectively.

Fig. 11
figure 11

Sensitivity obtained for HOG and GLRLM descriptors from IIC maps using different distance measures

Fig. 12
figure 12

Sensitivity obtained for HOG and GLRLM descriptors from IIC maps using different distance measures

Table 11 Evaluation of HOG and GLRLM descriptors on different datasets using IIC maps
Table 12 Evaluation of HOG and GLRLM descriptors on different datasets using GGW maps

8.10 Evaluation of deep features

For evaluating the performance of deep features, we used the pretrained Convolutional Neural Network (CNN) model, vgg-f [9] available in MatConvNet library [45]. The CNN model, vgg-f consists of 8 layers - 5 convolutional, and 3 fully-connected layers. The input image is resized to 224 x 224 pixel regions. The 4096 dimensional deep features are extracted from the 7th layer. We evaluated the deep features obtained from both IIC and GGW maps with various distance measures. Tables 13 and 14 shows the performance obtained using IIC and GGW maps with deep features extracted from the vgg-f model.

Table 13 Evaluation of deep features extracted from pre-trained model vgg on IIC maps of different datasets
Table 14 Evaluation of deep features extracted from pre-trained model vgg on GGW maps of different datasets

In IIC maps, the BH distance measure yielded best performance on DSO-I dataset, wheras the distance measure KL resulted in best performance on SwapMe dataset. The distance measure L2 provided best results on Combined FaceSwap dataset. Alternatively, on GGW maps, CR and KL distance measures showed good results on DSO-I dataset, and for SwapMe, the best performance is obtained with CR and DF distance measures. For Combined FaceSwap dataset, the best results are obtained with L2 distance. However, in both IIC and GGW maps, the performance of deep features was less compared to the texture descriptor-LPQ in DSO-I. But, for Swapme, and Combined FaceSwap datasets deep features resulted in comparable performance. Therefore, as in various computer vision domains were deep features yielded better performance, an application specific model could provide better performance for forgery localization from illumination maps too.

8.11 Effect of JPEG compression

The robustness of texture, color, and combined color-texture descriptors are evaluated against JPEG compression. Experiments are conducted on DSO-I dataset as the dataset contains images in uncompressed PNG format. For evaluation, the images in the DSO-I dataset are compressed with JPEG quality factors 60, 70, 80 , and 90. Figure 13 shows the performance of various texture features at different compression levels (60-90). Experiments are conducted on DSO-I dataset using IIC maps for exploring the effect of JPEG compression, since images in DSO-I dataset are in uncompressed PNG format. In Figs. 1314 and 15, the performance on the uncompressed version is marked as ’100’. As depicted in Fig. 13, the texture descriptors such as LBP, LPQ, and BSIF showed a performance variation on JPEG compressed images (marked as 60-90) compared to the uncompressed images (marked as ’100’). When the images are compressed, the JPEG boundary artifacts and the encoding scheme alter the underlying pixel boundaries, thereby affecting the segmentation of image during the generation of illumination map. An interesting observation is that the BGP and CLBP features perform relatively consistently in all JPEG compression levels and uncompressed version.

Fig. 13
figure 13

The performance of various texture descriptors at different JPEG compression levels. Quality factors considered are 60, 70, 80, 90. The uncompressed level 100 indicates the image without any compression. Experiments are carried on DSO-I dataset using IIC maps. a LBP b LPQ c BGP d BSIF e CLBP

Fig. 14
figure 14

The performance of various color descriptors at different JPEG compression levels. Quality factors considered are 60, 70, 80, 90. The uncompressed level 100 indicates the image without any compression. Experiments are carried on DSO-I dataset using IIC maps. a Opponent b Hue c ACN d CN e DCD11 f DCD25 g DCD50

Fig. 15
figure 15

The performance of various combined color-texture descriptors at different JPEG compression levels. Quality factors considered are 60, 70, 80, 90. The uncompressed level 100 indicates the image without any compression. Experiments are carried on DSO-I dataset using IIC maps. a ColorLPQ b JTD c STD

Figures 14 and 15 show the performance of various color and combined color-texture features respectively at the different compression levels. In Fig. 14, the color descriptors such as Opponent, Hue, ACN, CN, and DCD11 performs evenly at different JPEG compression levels and the uncompressed version. But, for color descriptors-DCD25 and DCD50, the performance degraded on JPEG compression. In Fig. 16, the combined color-descriptors-Color LPQ, JTD, and STD showed a performance degradation with JPEG compression. Also, the performance lowers at lower JPEG compression levels. This indicates that the combined color-texture descriptors are affected by JPEG encoding scheme.

Fig. 16
figure 16

Comparison of LPQ features with previous methods on DSO-I dataset

8.12 Comparison with other illumination inconsistency based methods

We found that the descriptors-LPQ, JTD, and HOG exhibited better performance on DSO-I, SwapMe and Combined FaceSwap datasets respectively. The performance of the descriptors are compared with the performance of two previous works proposed by Gholap and Bora [16], and Mazumdar and Bora [27]. Figures 1617 and 18 show the performance comparison on DSO-I datset, SwapMe and Combined FaceSwap datasets respectively. It is clear that compared to both the previous works, the feature descriptors such as LPQ, JTD, and HOG showed better results on DSO-I, SwapMe and Combined FaceSwap datasets respectively.

Fig. 17
figure 17

Comparison of JTD features with previous methods on SwapMe dataset

Fig. 18
figure 18

Comparison of HOG features with previous methods on Combined FaceSwap dataset

8.13 Discussion

In general, the experimental results reveal that the texture features and combined color-texture features are better at locating forged image regions from illuminant maps than color features. The reason is that, the changes in texture patterns are more prominent than the variations in color. In many of the images in the dataset, the color variations are too subtle to be captured by the color descriptors. The color variation will be subtle to detect unless there is a drastic difference in the illumination environment where the two photographs are captured. For example, in a spliced image composed of image regions captured at indoor and outdoor environments, there will be prominent variations in the color distribution.

9 Conclusion

Inconsistencies in illumination distribution can reveal forged image regions. In the proposed work, we carried out a comprehensive evaluation of the discriminative power of a number of texture, color and combined color-texture descriptors in forgery localization. For evaluation, forged images containing human facial regions were used and two variants of illumination distribution, such as IIC and GGW maps were considered. The discriminating capability of different features are assessed using 9 different histogram distance measures.

From the experiments, it is clear that texture descriptors are more capable of locating forged region compared to color features, and combined color-texture features. Also, among the various descriptors evaluated, we found that, LPQ descriptor showed the highest sensitivity of 70.91% in GGW map and 65.45% in IIC maps. We also evaluated deep features based on the pre-trained CNN model, vgg-f. But, evaluation showed that the performance of texture features is better compared to the deep features from the pre-trained model using the illuminant maps on DSO-I dataset.

We observed that the detection performance varied with different histogram distance measures for different descriptors indicating the differences in the nature of color and texture patterns captured by the feature descriptors. This suggests that a combination of features, such as a multi-feature representation may improve the detection accuracy. Hence, in future, we plan to consider future fusion for improving the accuracy in forgery localization.