Evaluating color and texture features for forgery localization from illuminant maps

Vidyadharan, Divya S.; Thampi, Sabu M.

doi:10.1007/s11042-017-5574-0

Evaluating color and texture features for forgery localization from illuminant maps

Published: 04 January 2018

Volume 77, pages 21131–21161, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Evaluating color and texture features for forgery localization from illuminant maps

Download PDF

Divya S. Vidyadharan^1,2 &
Sabu M. Thampi³

930 Accesses
5 Citations
Explore all metrics

Abstract

Images are widely accepted as a record of events even when images are prone to easy manipulations. It is difficult to identify image alterations by the human visual system. Once an image is identified as forged, the next step is to locate forged regions. Recently, distribution of scene illumination across an image has been analyzed to detect forged images and to locate forged image regions. In this paper, we investigate the problem of locating spliced image region based on illumination inconsistency. We investigated the discriminative power of a number of color and texture descriptors in locating spliced image regions. During digital crime investigations, often it is required to detect the spliced face in a group photo. Here, we have selected forged images containing human facial regions where the regions to be compared are of similar object material, human skin regions. We evaluated various color, texture, and combined color-texture descriptors in an unsupervised manner by comparing the distance between the feature vectors to identify the inconsistent image region. We also investigated the performance of different histogram similarity measures including heuristic histogram distance measures, non-parametric test statistics, information theoretic divergences, and cross-bin measures. Experiments show that the Local Phase Quantization (LPQ) descriptor performs best in identifying the spliced image region from the illuminant map.

Detecting Spliced Face Using Texture Analysis

Illuminant Color Inconsistency as a Powerful Clue for Detecting Digital Image Forgery: A Survey

Art Forgery Detection via Craquelure Pattern Matching

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Image contents are considered as an authentic representation of events. Plenty of images are encountered in our day-to-day life as more and more hand-held devices are equipped with image capturing and image editing tools in addition to traditional digital still cameras. This substantiated with the handiness of a lot of image processing software makes it easier to manipulate images. This has attracted the attention of researchers and has led to the development of a lot of image forensic techniques that reveal image alterations.

Digital crimes involving forged images containing human facial regions are increasing recently. Once an image is proved as a forged image, the next step is to locate the forged facial region. The techniques that locate forged regions are termed as forgery localization techniques. For example, in a forged group photo, the task is to locate the spliced facial region among various facial regions in the image. Usually, forgery localization techniques detect the presence of irregularities in forged regions. Whenever an image content is altered by adding/modifying an image region, it changes the original patterns of scene information such as the color of scene illumination, thus creating an inconsistent region.

In this work, we analyze the inconsistency in the scene illumination across different image regions for detecting forged region. For this, we rely upon the scene illumination representation proposed by Riess and Angelopoulou [33]. The color of scene illumination is recorded in the pixels and if the scene were illuminated by multiple light sources, then different image regions will exhibit different illuminant color, introducing a pattern. The properties of this pattern will be different at a spliced (copy-pasted) image region when compared to the untouched regions within the image. Since the surface reflectance properties depend on the material of the object, only similar object materials can be compared for checking the inconsistency. Therefore, the illumination pattern and color, in the facial skin regions are analyzed to reveal the spliced facial region.

Forgery detection and localization in spliced images by exploiting inconsistencies in illumination is addressed earlier in [6, 8, 11, 13, 15, 16, 27, 33, 46, 47, 51, 52]. Gholap and Bora developed a technique based on the difference in the illuminant color observed from different image regions [16]. Here, an image is declared as authentic if different image regions report the same illuminant color, otherwise declared as forged. Cao et al. developed a forgery detection method that considers color histograms and illuminant color differences estimated from foreground and background image regions [6]. Wu and Fang proposed another forgery localization method [51], where the image is divided into different overlapping blocks and one of the blocks is selected as the reference block. The image block is declared as spliced if the illuminant color difference between the corresponding block and the reference block is greater than a threshold.

Fan et al. devised a forgery localization technique that automatically selects reference illuminant color [13]. Here, the image is divided into different vertical and horizontal regions, a region is declared as spliced if the difference between the estimated illuminant color and the reference illuminant color is greater than a threshold. Illuminant estimation is carried out using 5 different algorithms, and for each algorithm the inconsistent regions are identified. Finally, the intersection of all inconsistent regions is declared as the spliced region. Vidyadharan and Thampi proposed a forgery localization technique by applying histogram distance measures on brightness distribution obtained from facial skin regions [46].

Among the forensic techniques that consider illumination inconsistency, certain works consider forgery localization in spliced images by analyzing human facial regions [8, 11, 15, 27, 33, 47]. Riess and Angelopoulou, in their pioneering work that forensically analyze illuminant distribution across an image, found that a manual examination of illuminant representation could reveal forged image regions [33]. This finding is further explored by Carvalho et al., by utilizing edge and texture features generated from the facial regions extracted from the illuminant maps [11]. They automated forgery detection by developing a machine learning technique that classifies forged and authentic images based on the discrepancies in the texture and edge features from the illuminant representation. Later, Carvalho et al. improved forgery detection and considered forgery localization as well, by considering color, and shape features, in addition to texture features with the help of a classifier ensemble [8].

Meanwhile, there are certain techniques that followed non-machine learning approaches for forgery detection and localization [15, 27, 47] from images containing spliced human facial regions. Francis et al. developed a forgery localization technique based on differences in illuminant color estimated from the nose tips of different persons in a group photo [15]. Vidyadharan and Thampi proposed a technique where a Principal Component Analysis (PCA) is carried out on facial regions extracted from the illuminant maps for locating spliced facial region [47]. Mazumdar and Bora devised a Dichromatic Plane Histogram (DPH) based technique to detect forged images [27]. The DPH is considered as an illumination signature for a face, and DPHs obtained from facial regions captured at similar illumination will be similar. The similarity between DPHs are examined using correlation measure. If the correlation measure of any of the face pairs in an image is lower than a threshold the image is considered forged.

In the proposed work, we investigate the discriminative power of different color, texture and combined color-texture features in locating spliced facial region from the illumination representation of an image. This evaluation is inspired by Van de Sande’s evaluation of color descriptors for scene recognition [36]. The attempt to consider combined color and texture features is motivated by the work of Khan et al. where a compact texture and color descriptor is used for texture classification [21].

The main contributions of the work are,

An evaluation of 5 texture descriptors, 5 color descriptors, 3 descriptors that combined color and texture features, 5 color moments and histogram descriptors, and 5 color-shape descriptors for forgery localization from illuminant maps.
Evaluation of the performance of various categories of histogram distance measures.
A comparison exhibiting a better detection accuracy for forgery localization based on texture descriptors than existing non-machine learning approaches.

Even though we evaluated the discriminative power of various descriptors on forgery localization, the results of the study can be applied to other image processing domains that consider the similarity of segmented images. Also, the comparison of features to detect similarity in digital data can be used in different applications [34, 41, 53, 54].

The rest of the paper is organized as follows. In Section 2, we discuss the representation of scene illumination and illuminant maps briefly. The texture descriptors considered in the work are briefed in Section 3. The color descriptors used for locating spliced face is discussed in Section 4. The combination features, such as color-texture descriptors are discussed in Section 5. Additional features considered are mentioned in Section 6. The evaluation framework illustrating how various descriptors are generated from the illumination representation and how the descriptor representing the spliced face is located is mentioned in Section 7. The details of experiments including the experimental setup, distance metrics used for comparing the feature descriptors, performance evaluation criteria, and the experimental analysis conducted are detailed in Section 8. Finally, the conclusion of the work is given in Section 9.

2 Representing scene illumination

When an image is captured by the camera sensor, the illumination present in the environment is recorded in the pixels. The scene illumination information present in an authentic unaltered image will be consistent whereas, in a forged image, the scene illumination will be inconsistent at the forged region. For studying the change in the pattern and color of illumination, we used illuminant maps - the scene illumination representation proposed by Riess and Angelopoulo [33].

For generating the illuminant map, an image is first segmented into regions of similar color. Each region is further divided into small patches. From each patch, the color of illumination, termed as illuminant color is estimated. Finally, a majority voted illuminant color is selected as the illuminant color of the region. Since illuminant color is estimated locally at a region this representation is capable of representing the multi-illuminant environment [33]. In [11], Carvalho et al. have used two variants of illuminant maps, such as Inverse Intensity Chromaticity (IIC) Map and Generalized Grey World (GGW) map. In GGW, the illuminant color in an image patch within a region is computed using the statistical approach followed in Generalized Grey Edge Framework proposed by Van de Weijer in [44]. In IIC map, the illuminant color in the image patch is computed using the Inverse Intensity Chromaticity space proposed by Tan et al. in [42] based on the physics-based Dichromatic Reflection Model [14]. Figure 1 shows an example of original unmodified image, a spliced image, corresponding illuminant maps, and facial regions extracted from the respective illuminant maps.

In our work, we evaluated the discriminative power of color, texture, and combined color-texture features in IIC and GGW illuminant maps. The illuminant maps show a texture pattern that varies in color, depending on the scene illumination intensity and the direction of light. Thus, the texture properties and color properties at a spliced region may differ from the rest of the untouched regions in the image. Here, we explore how the difference in the texture and color properties among different facial regions within an image can locate forged regions.

3 Texture descriptors

In this work, we evaluate the discriminative power of 5 popular texture descriptors.

Local Binary Pattern (LBP). Ojala et al. proposed a local texture descriptor known as Local Binary Pattern capable of capturing texture patterns [30, 40]. Here, the pixels in a local neighborhood are compared with the central pixel. If the neighboring pixel is greater than the central pixel, that pixel is coded with a one and otherwise a zero. Finally, these binary values are concatenated to get the LBP code. All these LBP codes are counted and the distribution is represented as the LBP histogram (256-bin).

Completed Local Binary Pattern (CLBP). Guo et al. devised a completed LBP descriptor that makes use of the Local Difference Sign-Magnitude Transform (LDSMT) [17]. In traditional LBP, texture pattern around the neighborhood of a pixel is represented by the sign of the difference between current and the central pixel. In complete LBP, both the magnitude of the difference and the central pixel value are also considered. The sign and magnitude CLBP representations, when joined, is represented as CLBP_S/M (59049-bin).

Local Phase Quantization (LPQ). T Ojansivu and Heikkilä [31] proposed blur invariant, local texture descriptor that performed better on non-blurred images as well. The phase of low-frequency components in the Fourier domain are used to represent texture. The phase values of four low-frequency components are decorrelated and quantized to get the LPQ codeword. Finally, the codewords are represented as LPQ histogram (256-bin).

Binarized Statistical Image Features (BSIF). BSIF is a 256-bin dense descriptor where a binary code for a pixel is generated by convolving a neighborhood region of pixels with filters that are learned by prior training by independent component analysis [20]. Training is performed using image patches randomly sampled from a small set of natural images. Thus, the filters capture the statistical properties of natural images.

Binary Gabor Pattern (BGP). Zhang et al. proposed a texture feature using Gabor filters [55]. Here, an image is convolved with even symmetric and odd symmetric Gabor filters at three different resolutions to obtain a 216 bin histogram termed as the Binary Gabor Pattern.

4 Color descriptors

Inconsistencies in illumination are visible as noticeable color changes in the illuminant maps of forged images. Thus, to study the discriminative power of color for forgery localization, we considered various color descriptors, including color name descriptors, color histograms and color moments.

Color Names. Color name descriptors represent colors based on human perception of color as linguistic terms such as ‘Red’ and ‘Blue’. Benavente et al. devised an 11-bin color name descriptor, Automatic Color Names (ACN), based on the parametric model with fuzzy set membership for different colors [2]. Van de Weijer et al. devised an improved version of linguistic color names, termed as Color Names (CN), by learning from real-world images [50].

Discriminative Color Descriptors(DCD). Khan et al. proposed a discriminative color des- criptors that represent the color features with clusters grouped based on their discriminative power in classifying images [22]. The color descriptors with 11, 25, or 50 clusters are available.

In addition to the above color descriptors, we have studied the performance of a few more color histograms and color moment descriptors mentioned in Table 1.

Table 1 Details of color histograms and color moments considered in the evaluation

Full size table

5 Combined color-texture descriptors

Although color and textures can be represented well as separate descriptors, certain attempts have been made to combine the features for computer vision tasks such as texture classification [21]. Currently, color and texture features are combined in either of two ways described as follows. In the first approach, color and texture features are computed separately and the final descriptors are combined together by concatenating the feature vectors. In the second approach, known as the joint approach, the texture descriptor is computed in different color channels separately and all these texture descriptors are concatenated later [1].

Color LPQ. Pedone and Heikkilä proposed an extension to LPQ that considers color features [32]. Here, the 1280-bin descriptor is computed from a multi-vector representation of color.

Color Texton Descriptors. Alvarez and Vanrell extended the traditional texton theoretic approach considering the color and shape of image blobs [1]. The basic image blob representation is as follows. A perceptual blob is defined as a region with similar color identified from opponent color space. Later, shape attributes and color attributes of blobs are extracted. Shape attributes include width, length, and orientation. Color attributes include the intensity, rg and by components extracted using the median of color information of all pixels belonging to the blob. Thus, there is a total of six attributes. Each attribute value is then quantized in m intervals resulting in 6 x m terms. Finally, the attributes are described by concatenating the probability distribution of all six attributes. This representation is further improved by adding the perceptual relationships between attributes and the co-occurrence of shape and color attributes of blobs. Perceptual relationships between shape attributes are added by transforming the attributes into shape space. The two axes in the shape space represent the width and length and the third axis represents the angle. Three quantization models such as Cartesian, Cylindrical and Circular models are used for shape space representation. The color attributes are represented as HSI-Carron [7] and HSV-Smith spaces [38]. These color and shape attributes can be represented either as combined at the blob level or separate at the image level resulting in the following texton descriptors.

Co-joint Texton Descriptor (JTD): This descriptor represents the color and texture attributes as a joint probability distribution. Thus the co-occurrence of color and shape attributes are taken care of.

Semi-joint Texton Descriptor (STD): This descriptor represents color and texture by concatenating the probability distribution of color and texture.

6 Additional features

For analyzing the differences in underlying pixel statistics, we have used two more features such as edge features and Grey Level Run Length features (GLRLM) [43]. Edge features are represented by Histogram of Oriented Gradients (HOG) proposed by Dalal and Triggs [10]. HOG and GLRLM features are 81-bin dimensional and 44-bin dimensional respectively.

We also considered features based on Scale Invariant Feature Transform (SIFT). The SIFT, proposed by Lowe [25] represents the local features representing regions around identified keypoints within an image. Since keypoint based feature extraction is performed, SIFT represents the spatial information of image regions. Here, different variants of SIFT such as HSV-SIFT [4], Opponent-SIFT [36], C-SIFT [5], rg-SIFT [36] and rgb-SIFT [36] as discussed by van de Sande et al. in [36], are considered. Details are given in Table 2.

Table 2 Details of color-shape descriptors considered for the evaluation

Full size table

7 Evaluation framework

We used the evaluation framework shown in Fig. 2 for evaluating the forgery localization capability of different feature descriptors. At first, the given image is represented as an illuminant map showing the variation in the illumination pattern across the image. The facial regions are extracted from this illuminant map by specifying the bounding box around each face. Then, the feature descriptor to be evaluated is generated from each face. Finally, the distance between the descriptors are compared among themselves, to identify the descriptor that represents the spliced face. The descriptors representing faces captured at same illuminant environment will be similar compared to the descriptor representing spliced face. Therefore, the distance between descriptors representing authentic faces will be less compared to the descriptor representing spliced face. Accordingly, the spliced face is located. We evaluated different histogram distance measures to identify the most dissimilar feature descriptor. The steps involved in evaluating a feature descriptor with M number of distance measures is given in Algorithm 1.

8 Experiments and results

8.1 Datasets

For experimental evaluation we used spliced images from three datasets, such as i) DSO-I ii) SwapMe and iii) FaceSwap. The DSO-I dataset is taken from tifs-database^{Footnote 1} [11]. The tifs-database contains 100 spliced images containing human facial regions, saved in Portable Network Graphics (PNG) with a resolution of 2,048 x 1536 pixels. We used a subset of 55 spliced images that contain more than two facial regions for evaluating the forgery localization capability of various color and texture descriptors.

The SwapMe and FaceSwap datasets contains spliced images created by exchanging a source facial region with a destination facial regions [56]. For our experiments, we selected a subset of 55 spliced images from the Swapme dataset. From the FaceSwap dataset, we selected 33 images that were not present in Swapme. Along with FaceSwap images, we combined our own set of 7 spliced images to create a Combined FaceSwap dataset. All the selected spliced images in both Swapme and Combined FaceSwap contained three or more facial regions.

We used two variants of illuminant maps such as GGW and IIC. In Sections 8.2–8.13, we present the feature extraction process, performance evaluation criteria, and the evaluation of various descriptors for forgery localization from illuminant maps.

8.2 Feature extraction

For illumination representation, the two variants of illuminant maps such as IIC and GGW maps were generated using the software^{Footnote 2} [33]. From the illuminant map, all the facial regions are extracted. Then, feature descriptors are computed for each facial region. The similarity among the feature vectors is measured using different histogram distance measures.

8.3 Distance measures considered

Feature descriptors extracted from the facial regions in the illuminant maps can be considered as a distribution. As studied in the work of Meshgi and Ishii [28], we compared feature descriptors using various categories of histogram distance measures such as, heuristic distance measures, non-parametric test statistics, information theoretic divergences, and cross-bin distance measures.

From heuristic distance measures, we considered the L2 distance and the Pearson Correlation Coefficient [3]. From the non-parametric test statistics, we used the Kolmogorov-Smirnov distance (KS) [26], Cramer-von Mises Statistics (CM) [12], Chi-square (CS) statistics [23, 39], and Bhattacharya Distance (BD) [19]. The Kullback-Leibler divergence (KL) measure, is based on information theoretic divergences [23]. The Diffusion Distance (DF) [24] and the Earth Mover’s Distance (EMD) consider cross-bin information capable of capturing the perceptual similarity of images [23, 35].

8.4 Performance evaluation criteria

In forgery localization, the objective is to locate forged image region within spliced images. Hence, the performance of a method can be measured by the detection rate of detecting forged facial regions within spliced images. Here, the performance is evaluated using the following performance metrics,

$$ Sensitivity\quad or\quad Recall\quad or\quad TPR = \frac{{Faces}_{Located}}{{Images}_{Spliced}} $$

(1)

where Faces_Located = No. of spliced faces located correctly,

Images_Spliced = Total no.of spliced faces, and TPR is the True Positive Rate.

$$ Specificity\quad or\quad TNR = \frac{{AuthenticFaces}_{Detected}}{{Faces}_{Authentic}} $$

(2)

where AuthenticFaces_Detected = No. of authentic faces detected correctly, and

Faces_Authentic = Total no.of authentic faces.

$$ \scriptsize Accuracy = \frac{TP + TN}{TP+FP+FN+TN} $$

(3)

$$ \scriptsize Precision = \frac{TP}{TP+FP} $$

(4)

$$ \scriptsize F-Score = 2\frac{Precision.Recall}{Precision+Recall} $$

(5)

8.5 Experiment 1: evaluating texture descriptors

LBP and LPQ descriptors are obtained using the software provided by the respective authors ^{Footnote 3}. The BSIF ^{Footnote 4} and BGP ^{Footnote 5} features are estimated using the source code provided by the corresponding authors. The CLBP features are generated by using the source code provided by Guo et al. ^{Footnote 6} [17].

First, we considered texture descriptors and the sensitivity obtained from IIC and GGW illuminant maps are shown in Figs. 3 and 4 respectively. It is interesting to note that, the distance measure providing the best sensitivity is different for different texture features. The reason is that the nature of the feature vectors and the discriminative capability of different texture descriptors varies. Among the various distance measures, CS, BH, and KL yielded the best results for LBP and LPQ descriptor in IIC map. For BGP, the distance measures L2, and DF provided the best results. BSIF, and CLBP showed highest sensitivity with the distance measures BH and, KL respectively.

In GGW map, the distance measures-CS and BH yielded best result for LBP and LPQ (see Fig. 4). BGP showed good performance with CR, DF, EM distance measures and BSIF exhibited highest sensitivity with KL distance measure. For comparing the performance of the texture descriptors in different datasets such as DSO-I, SwapMe, and Combined FaceSwap, we selected the distance measure-BH that provided better sensitivity in both IIC and GGW maps.

The precision, sensitivity, specificity, accuracy and F-Score obtained for texture descriptors using BH distance on IIC and GGW maps are shown in Tables 3 and 4, respectively. Compared to the performance of texture descriptors on IIC maps, the descriptors exhibited better performance on GGW maps on all the three datasets. This means that the GGW maps carried more texture variations capable of discriminating spliced regions from authentic regions.

Table 3 Evaluation of texture descriptors on different datasets using IIC maps

Full size table

Table 4 Evaluation of texture descriptors on different datasets using GGW maps

Full size table

8.6 Experiment 2: evaluating color descriptors

For evaluating various color descriptors for forgery localization, we used the software provided by Joost van de Weijer ^{Footnote 7}. The sensitivity obtained for various color descriptors on IIC and GGW illuminant maps of spliced images in DSO-I dataset are shown in Figs. 5 and 6 respectively.

As in the case of texture descriptors, here also, for both IIC and GGW maps, the highest sensitivity obtained for each color descriptor is different for different distance measures. This indicates that the nature of the feature vector obtained with different descriptors are different. In IIC map, distant measure DF resulted in the best sensitivity with DCD25 and DCD50 descriptors. But, on GGW, the descriptor - Opponent histogram yielded highest sensitivity using KS distance measure. This shows that both IIC and GGW may contain varied color features, that can be captured using different color descriptors. Hence, it would be better if both illuminant maps are considered in forgery localization techniques.

The precision, recall, TNR, accuracy and F-Score obtained for color descriptors on the three datasets using IIC and GGW maps (with DF distance measure) are shown in Tables 5 and 6, respectively. For DSO-I dataset, the IIC maps showed better performance for color descriptors compared to GGW maps. But, for SwapMe and Combined FaceSwap, the variation in the performance of descriptors are less noticeable. This is due to the fact that, DSO-I dataset contains images in uncompressed PNG format, and the IIC and GGW maps showed visible color variations. In addition to this, the clarity of facial regions in DSO-I dataset were better compared to the clarity of facial regions in both SwapMe and Combined FaceSwap.

Table 5 Evaluation of color descriptors on different datasets using IIC maps

Full size table

Table 6 Evaluation of color descriptors on different datasets using GGW maps

Full size table

8.7 Experiment 3: evaluating combined color and texture descriptors

In this section, the discriminative power of descriptors that consider both color and texture features, such as Color-LPQ, two variants of color textons - JTD and STD are evaluated. The source code for computing Color LPQ descriptor was provided by Pedone [32]. The JTD and STD descriptors are generated using the source code provided by Alvarez and Vanrell ^{Footnote 8} [1]. The sensitivity obtained for various combined color and texture descriptors on IIC and GGW illuminant maps of spliced images in DSO-I dataset are shown in Figs. 7 and 8 respectively.

In IIC map, Color LPQ and STD provided highest sensitivity with the distance measure CR as shown in Fig. 7. Similarly, Color LPQ showed highest sensitivity with CR distance measure in GGW maps too. The precision, recall, TNR, and F-Score obtained for combined color-texture descriptors on various datasets using CR measure on both IIC and GGW maps are shown in Tables 7 and 8, respectively.

Table 7 Evaluation of combined color-texture on different datasets using IIC maps

Full size table

Table 8 Evaluation of combined color-texture on different datasets using GGW maps

Full size table

8.8 Experiment 4: evaluating color histograms, moments and color shape descriptors

Color histograms, moments and shape descriptors are generated using the source code provided by Koen van de Sande ^{Footnote 9} [37].

Figure 9 shows the sensitivity obtained for color histogram moments and moment descriptors in IIC map. The color moment invariants and rgb histogram yielded the highest sensitivity of 64.58% with CS and KL distance measures. Table 9 shows the evaluation using various performance metrics obtained on different datasets using CS distance measure. Fig. 10 shows the sensitivity evaluation of color-shape descriptors. Among the different variants of color shape descriptors, Hue-SIFT exhibited the highest sensitivity of 54.17% using L2 distance measure. Table 10 shows the evaluation of various performance metrics obtained on DSO-I dataset using L2 distance measure.

Table 9 Evaluation of color histogram and moment descriptors on different datasets using IIC maps

Full size table

Table 10 Evaluation of color shape descriptors on DSO-I dataset using IIC maps

Full size table

8.9 Experiment 5: evaluating HOG and GLRLM descriptors

GLRLM and HOG features are computed using the Matlab source codes provided by Wei [48] and Ludwig et al. [18] respectively. The highest sensitivity obtained for HOG is 43.64% in IIC map using CR, DF distance measures, and 52.73% in GGW map with distance measure DF. In both, IIC and GGW, the HOG descriptors performed better than GLRLM descriptors. Figures 11 and 12 shows the sensitivity obtained for various distance measures in IIC and GGW maps respectively. The evaluation of HOG and GLRLM descriptors on different datasets using IIC maps and GGW maps is shown in Tables 11 and 12 respectively.

Table 11 Evaluation of HOG and GLRLM descriptors on different datasets using IIC maps

Full size table

Table 12 Evaluation of HOG and GLRLM descriptors on different datasets using GGW maps

Full size table

8.10 Evaluation of deep features

For evaluating the performance of deep features, we used the pretrained Convolutional Neural Network (CNN) model, vgg-f [9] available in MatConvNet library [45]. The CNN model, vgg-f consists of 8 layers - 5 convolutional, and 3 fully-connected layers. The input image is resized to 224 x 224 pixel regions. The 4096 dimensional deep features are extracted from the 7th layer. We evaluated the deep features obtained from both IIC and GGW maps with various distance measures. Tables 13 and 14 shows the performance obtained using IIC and GGW maps with deep features extracted from the vgg-f model.

Table 13 Evaluation of deep features extracted from pre-trained model vgg on IIC maps of different datasets

Full size table

Table 14 Evaluation of deep features extracted from pre-trained model vgg on GGW maps of different datasets

Full size table

In IIC maps, the BH distance measure yielded best performance on DSO-I dataset, wheras the distance measure KL resulted in best performance on SwapMe dataset. The distance measure L2 provided best results on Combined FaceSwap dataset. Alternatively, on GGW maps, CR and KL distance measures showed good results on DSO-I dataset, and for SwapMe, the best performance is obtained with CR and DF distance measures. For Combined FaceSwap dataset, the best results are obtained with L2 distance. However, in both IIC and GGW maps, the performance of deep features was less compared to the texture descriptor-LPQ in DSO-I. But, for Swapme, and Combined FaceSwap datasets deep features resulted in comparable performance. Therefore, as in various computer vision domains were deep features yielded better performance, an application specific model could provide better performance for forgery localization from illumination maps too.

8.11 Effect of JPEG compression

The robustness of texture, color, and combined color-texture descriptors are evaluated against JPEG compression. Experiments are conducted on DSO-I dataset as the dataset contains images in uncompressed PNG format. For evaluation, the images in the DSO-I dataset are compressed with JPEG quality factors 60, 70, 80 , and 90. Figure 13 shows the performance of various texture features at different compression levels (60-90). Experiments are conducted on DSO-I dataset using IIC maps for exploring the effect of JPEG compression, since images in DSO-I dataset are in uncompressed PNG format. In Figs. 13, 14 and 15, the performance on the uncompressed version is marked as ’100’. As depicted in Fig. 13, the texture descriptors such as LBP, LPQ, and BSIF showed a performance variation on JPEG compressed images (marked as 60-90) compared to the uncompressed images (marked as ’100’). When the images are compressed, the JPEG boundary artifacts and the encoding scheme alter the underlying pixel boundaries, thereby affecting the segmentation of image during the generation of illumination map. An interesting observation is that the BGP and CLBP features perform relatively consistently in all JPEG compression levels and uncompressed version.

Figures 14 and 15 show the performance of various color and combined color-texture features respectively at the different compression levels. In Fig. 14, the color descriptors such as Opponent, Hue, ACN, CN, and DCD11 performs evenly at different JPEG compression levels and the uncompressed version. But, for color descriptors-DCD25 and DCD50, the performance degraded on JPEG compression. In Fig. 16, the combined color-descriptors-Color LPQ, JTD, and STD showed a performance degradation with JPEG compression. Also, the performance lowers at lower JPEG compression levels. This indicates that the combined color-texture descriptors are affected by JPEG encoding scheme.

8.12 Comparison with other illumination inconsistency based methods

We found that the descriptors-LPQ, JTD, and HOG exhibited better performance on DSO-I, SwapMe and Combined FaceSwap datasets respectively. The performance of the descriptors are compared with the performance of two previous works proposed by Gholap and Bora [16], and Mazumdar and Bora [27]. Figures 16, 17 and 18 show the performance comparison on DSO-I datset, SwapMe and Combined FaceSwap datasets respectively. It is clear that compared to both the previous works, the feature descriptors such as LPQ, JTD, and HOG showed better results on DSO-I, SwapMe and Combined FaceSwap datasets respectively.

8.13 Discussion

In general, the experimental results reveal that the texture features and combined color-texture features are better at locating forged image regions from illuminant maps than color features. The reason is that, the changes in texture patterns are more prominent than the variations in color. In many of the images in the dataset, the color variations are too subtle to be captured by the color descriptors. The color variation will be subtle to detect unless there is a drastic difference in the illumination environment where the two photographs are captured. For example, in a spliced image composed of image regions captured at indoor and outdoor environments, there will be prominent variations in the color distribution.

9 Conclusion

Inconsistencies in illumination distribution can reveal forged image regions. In the proposed work, we carried out a comprehensive evaluation of the discriminative power of a number of texture, color and combined color-texture descriptors in forgery localization. For evaluation, forged images containing human facial regions were used and two variants of illumination distribution, such as IIC and GGW maps were considered. The discriminating capability of different features are assessed using 9 different histogram distance measures.

From the experiments, it is clear that texture descriptors are more capable of locating forged region compared to color features, and combined color-texture features. Also, among the various descriptors evaluated, we found that, LPQ descriptor showed the highest sensitivity of 70.91% in GGW map and 65.45% in IIC maps. We also evaluated deep features based on the pre-trained CNN model, vgg-f. But, evaluation showed that the performance of texture features is better compared to the deep features from the pre-trained model using the illuminant maps on DSO-I dataset.

We observed that the detection performance varied with different histogram distance measures for different descriptors indicating the differences in the nature of color and texture patterns captured by the feature descriptors. This suggests that a combination of features, such as a multi-feature representation may improve the detection accuracy. Hence, in future, we plan to consider future fusion for improving the accuracy in forgery localization.

Notes

References

Alvarez S, Vanrell M (2012) Texton theory revisited: a bag-of-words approach to combine textons. Pattern Recogn 45(12):4312–4325
Article Google Scholar
Benavente R, Vanrell M, Baldrich R (2008) Parametric fuzzy sets for automatic color naming. JOSA A 25(10):2582–2593
Article Google Scholar
Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing. Springer, pp 1–4
Bosch A, Zisserman A, Muoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30 (4):712–727. https://doi.org/10.1109/TPAMI.2007.70716
Article Google Scholar
Burghouts GJ, Geusebroek JM (2009) Performance evaluation of local colour invariants. Comput Vis Image Underst 113(1):48–62
Article Google Scholar
Cao G, Zhao Y, Ni R (2008) Image composition detection using object-based color consistency. In: 2008 9Th international conference on signal processing, pp 1186–1189. https://doi.org/10.1109/ICOSP.2008.4697342
Carron T, Lambert P (1994) Color edge detector using jointly hue, saturation and intensity. In: 1994 Proceedings of IEEE international conference on Image processing, ICIP-94, vol 3. IEEE, pp 977–981
Carvalho T, Faria FA, Pedrini H, Torres RdS, Rocha A (2016) Illuminant-based transformed spaces for image forensics. IEEE Trans Inf Forensics Secur 11(4):720–733
Article Google Scholar
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: British machine vision conference
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893. https://doi.org/10.1109/CVPR.2005.177
De Carvalho TJ, Riess C, Angelopoulou E, Pedrini H, de Rezende Rocha A (2013) Exposing digital image forgeries by illumination color classification. IEEE Trans Inf Forensics Secur 8(7):1182–94
Article Google Scholar
Durbin J, Knott M (1972) Components of cramer-von mises statistics. i. J R Stat Soc Ser B Methodol 34:290–307
MathSciNet MATH Google Scholar
Fan Y, Carr P, Fernandez-Maloigne C (2015) Image splicing detection with local illumination estimation. In: 2015 IEEE international conference on Image processing (ICIP), pp 2940–44. https://doi.org/10.1109/ICIP.2015.7351341
Finlayson GD, Schaefer G (2001) Solving for colour constancy using a constrained dichromatic reflection model. Int J Comput Vis 42(3):127–144
Article MATH Google Scholar
Francis K, Gholap S, Bora PK (2014) Illuminant colour based image forensics using mismatch in human skin highlights. In: 2014 twentieth national conference on Communications (NCC). IEEE, pp 1–6
Gholap S, Bora PK (2008) Illuminant colour based image forensics. In: TENCON 2008 - 2008 IEEE Region 10 conference, pp 1–5. https://doi.org/10.1109/TENCON.2008.4766772
Guo Z, Zhang L, Zhang D (2010) A completed modeling of local binary pattern operator for texture classification. IEEE Trans Image Process 19(6):1657–1663. https://doi.org/10.1109/TIP.2010.2044957
Article MathSciNet MATH Google Scholar
Junior OL, Delgado D, Goncalves V, Nunes U (2009) Trainable classifier-fusion schemes: an application to pedestrian detection. In: 2009 12Th international IEEE conference on intelligent transportation systems, pp 1–6. https://doi.org/10.1109/ITSC.2009.5309700
Kailath T (1967) The divergence and bhattacharyya distance measures in signal selection. IEEE Trans Commun Technol 15(1):52–60
Article Google Scholar
Kannala J, Rahtu E (2012) Bsif: binarized statistical image features. In: 2012 21st International Conference on Pattern Recognition (ICPR). IEEE, pp 1363–1366
Khan FS, Anwer RM, van de Weijer J, Felsberg M, Laaksonen J (2015) Compact color–texture description for texture classification. Pattern Recogn Lett 51:16–22
Article Google Scholar
Khan R, van de Weijer J, Khan FS, Muselet D, Ducottet C, Barat C (2013) Discriminative color descriptors. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2866–2873. https://doi.org/10.1109/CVPR.2013.369
Levina E, Bickel P (2001) The earth mover’s distance is the mallows distance: Some insights from statistics. In: 2001 Proceedings of Eighth IEEE international conference on Computer Vision (ICCV), vol 2. IEEE, pp 251–256
Ling H, Okada K (2006) Diffusion distance for histogram comparison. In: 2006 IEEE computer society conference on Computer vision and pattern recognition, vol 1. IEEE, pp 246–253
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Massey FJ Jr (1951) The kolmogorov-smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78
Article MATH Google Scholar
Mazumdar A, Bora PK (2016) Exposing splicing forgeries in digital images through dichromatic plane histogram discrepancies. In: Proceedings of the tenth indian conference on computer vision, graphics and image processing. ACM, p 62
Meshgi K, Ishii S (2015) Expanding histogram of colors with gridding to improve tracking accuracy. In: 2015 14th IAPR International Conference on Machine Vision Applications (MVA). IEEE, pp 475–479
Mindru F, Tuytelaars T, Van Gool L, Moons T (2004) Moment invariants for recognition under changing viewpoint and illumination. Comput Vis Image Underst 94(1):3–27
Article Google Scholar
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623
Article MATH Google Scholar
Ojansivu V, Heikkilä J (2008) Blur insensitive texture classification using local phase quantization. In: International conference on image and signal processing. Springer, pp 236–243
Pedone M, Heikkil J (2012) Local phase quantization descriptors for blur robust and illumination invariant recognition of color textures. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp 2476–2479
Riess C, Angelopoulou E (2010) Scene illumination as an indicator of image manipulation. In: Information hiding, vol 6387, pp 66–80
Roemer J, Groman M, Yang Z, Wang Y, Tan CC, Mi N (2014) Improving virtual machine migration via deduplication. In: 2014 IEEE 11th International Conference on Mobile ad hoc and Sensor Systems (MASS). IEEE, pp 702–707
Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40(2):99–121
Article MATH Google Scholar
van de Sande K, Gevers T, Snoek C (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596. https://doi.org/10.1109/TPAMI.2009.154
Article Google Scholar
van de Sande KEA, Gevers T, Snoek CGM (2011) Empowering visual categorization with the gpu. IEEE Trans Multimedia 13(1):60–70. http://www.science.uva.nl/research/publications/2011/vandeSandeITM2011
Article Google Scholar
Smith AR (1978) Color gamut transform pairs. ACM Siggraph Comput Graph 12(3):12–19
Article Google Scholar
Steiger JH, Shapiro A, Browne MW (1985) On the multivariate asymptotic distribution of sequential chi-square statistics. Psychometrika 50(3):253–263
Article MathSciNet MATH Google Scholar
Ojala T, Pietikäinen M, M?enpää T (2001) A generalized local binary pattern operator for multiresolution gray scale and rotation invariant texture classification. In: Advances in Pattern Recognition, ICAPR 2001 Proceedings, Lecture Notes in Computer Science 2013. Springer, pp 397–406
Tai J, Liu D, Yang Z, Zhu X, Lo J, Mi N (2017) Improving flash resource utilization at minimal management cost in virtualized flash-based storage systems. IEEE Trans Cloud Comput 5(3):537–549
Article Google Scholar
Tan RT, Nishino K, Ikeuchi K (2004) Color constancy through inverse-intensity chromaticity space. J Opt Soc Am A 21(3):321–34
Article Google Scholar
Tang X (1998) Texture information in run-length matrices. IEEE Trans Image Process 7(11):1602–1609. https://doi.org/10.1109/83.725367
Article Google Scholar
Van De Weijer J, Gevers T, Gijsenij A (2007) Edge-based color constancy. IEEE Trans Image Process 16(9):2207–14
Article MathSciNet Google Scholar
Vedaldi A, Lenc K (2015) Matconvnet: convolutional neural networks for matlab. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 689–692
Vidyadharan DS, Thampi SM (2015) Brightness distribution based image tampering detection. In: 2015 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES). IEEE, pp 1–5
Vidyadharan DS, Thampi SM (2015) Detecting spliced face in a group photo using pca. In: 2015 7th International Conference of Soft Computing and Pattern Recognition (SOCPAR). IEEE, pp 175–180
Wei X (2007) Gray level run length matrix toolbox v1.0. Software Beijing Aeronautical Technology Research Center. http://www.mathworks.com/matlabcentral/fileexchange/download.do?objectId=17482&fn=RunLengthMatrixToolboxver1.0&fe=.zip&cid=1101680
Van de Weijer J, Gevers T, Bagdanov AD (2006) Boosting color saliency in image feature detection. IEEE Trans Pattern Anal Mach Intell 28(1):150–156
Article Google Scholar
van de Weijer J, Schmid C, Verbeek J, Larlus D (2009) Learning color names for real-world applications. IEEE Trans Image Process 18(7):1512–1523. https://doi.org/10.1109/TIP.2009.2019809
Article MathSciNet MATH Google Scholar
Wu X, Fang Z (2011) Image splicing detection using illuminant color inconsistency. In: 2011 third International Conference on Multimedia Information Networking and Security (MINES). IEEE, pp 600–03
Yan-li H, Shao-Zhang N, Jian-Cheng Z, Lin-Na Z (2014) Forensics of image tampering based on the consistency of illuminant chromaticity. In: 2014 Annual Summit and Conference on Asia-Pacific Signal and Information Processing Association (APSIPA). IEEE, pp 1–4
Yang Z, Awasthi M, Ghosh M, Mi N (2016) A fresh perspective on total cost of ownership models for flash storage in datacenters. In: 2016 IEEE International Conference on Cloud Computing Technology and Science (CLOUDCOM). IEEE, pp 245–252
Yang Z, Tai J, Bhimani J, Wang J, Mi N, Sheng B (2016) Grem: dynamic ssd resource allocation in virtualized storage systems with heterogeneous io workloads. In: 2016 IEEE 35th International on Performance Computing and Communications Conference (IPCCC). IEEE, pp 1–8
Zhang L, Zhou Z, Li H (2012) Binary gabor pattern: an efficient and robust descriptor for texture classification. In: 2012 19Th IEEE international conference on image processing, pp 81–84. https://doi.org/10.1109/ICIP.2012.6466800
Zhou P, Han X, Morariu VI, Davis LS (2017) Two-stream neural networks for tampered face detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, pp 1831–1839

Download references

Acknowledgment

Authors acknowledge the Department of Higher Education, Government of Kerala for funding the research and the Department of Computer Science and Engineering, College of Engineering-Trivandrum for providing lab facilities to carry out the work. The authors would like to thank Dr. Tiago José De Carvalho for sharing the database. Also, authors would like to thank Mr. Aniruddha Mazumdar for sharing the source code of previous works for comparison.

Author information

Authors and Affiliations

College of Engineering-Trivandrum, Thiruvananthapuram, India
Divya S. Vidyadharan
LBS Centre for Science and Technology, University of Kerala, Thiruvananthapuram, Kerala, India
Divya S. Vidyadharan
Indian Institute of Information Technology and Management-Kerala, Thiruvananthapuram, India
Sabu M. Thampi

Authors

Divya S. Vidyadharan
View author publications
You can also search for this author in PubMed Google Scholar
Sabu M. Thampi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Divya S. Vidyadharan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vidyadharan, D.S., Thampi, S.M. Evaluating color and texture features for forgery localization from illuminant maps. Multimed Tools Appl 77, 21131–21161 (2018). https://doi.org/10.1007/s11042-017-5574-0

Download citation

Received: 31 March 2017
Revised: 29 November 2017
Accepted: 21 December 2017
Published: 04 January 2018
Issue Date: August 2018
DOI: https://doi.org/10.1007/s11042-017-5574-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Evaluating color and texture features for forgery localization from illuminant maps

Abstract

Similar content being viewed by others

Detecting Spliced Face Using Texture Analysis

Illuminant Color Inconsistency as a Powerful Clue for Detecting Digital Image Forgery: A Survey

Art Forgery Detection via Craquelure Pattern Matching

1 Introduction

2 Representing scene illumination

3 Texture descriptors

4 Color descriptors

5 Combined color-texture descriptors

6 Additional features

7 Evaluation framework