Keywords

1 Introduction

Image contents are considered as an authentic representation of events. Plenty of images are encountered in our day-to-day life as more and more hand-held devices are equipped with image capturing and image editing tools in addition to traditional digital still cameras. This substantiated with the handiness of a lot of image processing software led to the unlawful use of digital images. People started manipulating the image contents in accordance with illegitimate intentions. This has attracted researchers worldwide and a lot of image forensic techniques that reveal image alterations are being developed in recent times.

Different types of alterations include copy-pasting an image region from another image (image-splicing), copy-pasting an image region onto the same image (copy-move forgery). These kinds of forgeries have severe adverse effects on social life, depending on the realm of the crime. For example, a forged photograph claiming an illicit circumstance with an innocent victim can cause unforeseen damage to the social reputation of the victim. Digital forensic investigators are constantly looking for tools to aid the investigation of digital crimes involving multimedia contents as evidence. Though a large number of publications in image forensics domain focus on image forgery detection [1, 4, 9, 11], only a few works such as the techniques discussed in [2, 3, 5, 7, 10, 13, 14] have addressed the problem of locating spliced human facial region.

In the proposed work, image alteration involving copying a facial region from one image and pasting onto another image is considered. Consider a group photo, where a few persons were present originally, being altered to add one more facial region onto the photo, with a malicious intention of creating a forged evidence for a crime. Here, the problem is to detect the spliced photo by analyzing the image properties of all the faces. Usually, a copy-paste operation disrupts the original pattern of image properties. Therefore, image forgery detection techniques try to exploit these disruptions in image properties such as texture, scene illumination, and noise features. The proposed work presents a technique to identify spliced facial regions from an image containing a group of people by analyzing the texture in illumination representation of facial regions.

The rest of the paper is organized as follows. Related works are discussed in Sect. 2. The proposed method is explained in detail in Sect. 3. Experimental results obtained are discussed in Sect. 4. Finally, the conclusion is given in Sect. 5.

2 Related Work

Scene illuminant information is exploited as a forensics hint in a number of image forgery detection techniques. The related techniques where scene illuminants from human facial skin pixels are analyzed are discussed here [2, 3, 5, 7, 10, 13, 14].

Riess and Angelopoulou in 2010 introduced the concept of illuminant representation across an image [10]. Inconsistencies in this representation, if any, are visible as differently colored areas. Here, a manual examination of different facial regions is needed to detect the spliced person in an image. Carvalho et al. overcame the manual analysis of the illuminant representation by a machine learning technique that took into account the texture descriptors and edge descriptors for classifying an image as altered or original [3].

Francis et al. estimated the illuminant color from skin highlight region at the nose tip and detected the difference in illuminant color among different faces in an image [5]. The difference in the illuminant colors obtained from regions originally present in an image will be too low whereas the difference between illuminant colors between an original and spliced face will be noticeable. The closer illuminant color coordinates represent original faces and the face with largest illuminant color distance measure represents the spliced face.

Vidyadharan and Thampi detected the spliced face by analyzing the difference in the brightness distribution from the facial skin pixels [13]. The brightness distribution is obtained by taking the histogram of Value plane in Hue-Saturation-Value colour space. Faces with closer brightness distribution are identified as original whereas face with a distant brightness distribution is identified as fake. The closeness of brightness distribution is measured using bin-by-bin and cross-bin histogram comparison methods.

Carvalho et al. improved the machine learning based image forgery detection technique in 2013 by using a large number of features and classifier fusion in 2014 [2]. In addition to forgery detection, Carvalho et al. addressed the problem of locating spliced face on images already marked as altered, using a machine learning approach and achieved a detection accuracy of 85% in locating spliced face.

Vidyadharan and Thampi proposed an unsupervised non-machine learning method for detecting spliced face from tampered images [14] in 2015. A Principal Component Analysis (PCA) is carried out directly on the gray level values of facial regions extracted from the illuminant representation to reveal the outlier spliced face. The method obtained an accuracy of 62% and 64% in Inverse Intensity-Chromaticity (IIC) based illuminant representation and Generalized Gray World (GGW) illuminant representation.

Mazumdar and Bora have proposed a spliced face detection method based on Dichromatic Reflection Model [7]. Here Dichromatic Plane Histogram (DPH) is extracted from each face and is used as a signature. These histograms are compared using correlation measure to analyze the similarities in the illumination among various faces in the image. The similarity score between faces are checked with a threshold to detect the spliced face among a group of faces in an image. The method obtains an accuracy of 91.2%, but omitted images containing different skin tones.

The proposed work is motivated by the fact that there will be inconsistencies in the texture pattern in the illuminant representation of an image, when image regions captured at different illumination conditions are spliced to form a forged image. Hence, we analyzed the variation in the texture pattern in the illumination representation across various facial regions in an image to locate spliced facial region.

3 The Proposed Method of Detecting Spliced Face

Different stages of the proposed system are shown in Fig. 1. The method proceeds by extracting texture features from all the faces from the illuminant representation followed by distance comparison between each pair of faces for detecting the spliced face.

Fig. 1.
figure 1

Different phases of proposed system

The illuminant representation of facial regions captured at different lighting environment shows differences in the illumination pattern. Thus, the illuminant texture features extracted from the faces will be distinct for different faces in a spliced image. The proposed method detects spliced face by exploring the difference in the texture pattern as illustrated in Fig. 1. The different steps are detailed in the sub-sections. The algorithm of the proposed method is shown in Algorithm 1.

figure a
Fig. 2.
figure 2

(a) An input image (b) Illuminant representation (c) Detected copy-pasted face shown in the red box. (Color figure online)

3.1 Illuminant Representation and Facial Region Extraction

The facial regions are identified by specifying the bounding box enclosing each facial skin region from the input image. From the image, IIC and GGW illuminant representations are obtained by the method explained in Carvalho’s techniques [3, 10]. The IIC is based on the method proposed by Riess and Angelopoulou [10]. The GGW illuminant representation is obtained by applying the method proposed by Van De Weijer [12] on the image partitioned based on similar color [3]. From the illuminant representation, facial regions are extracted and converted to gray scale.

3.2 Extraction of Texture Features

The texture description of illumination pattern from each face is represented by Local Phase Quantization features proposed by Ojansivu and Heikkilä [8]. Here, the Fourier phase information is used. For each pixel in the input image, a neighborhood is considered and is represented in the frequency domain. In the frequency domain, the local frequency coefficients are taken at a set of frequencies. A threshold is used to get a binary representation of value at each of the frequency. The binary codes from the imaginary and real parts are combined to obtain an 8-bit binary code. This code is calculated for all the pixels in the image. The histogram generated from the codes of all pixels in the image is termed as the LPQ descriptor. The LPQ descriptor is generated from the facial illuminant representation of each face in the given image. Each descriptor is a 256 feature vector. For an image as shown in Figs. 2 and 3 LPQ features will be obtained.

3.3 Similarity Comparison and Detection of Copy-Pasted Face

Once the texture descriptors are extracted from the identified facial regions in an image, a similarity comparison is carried out. Since, the 256 bin LPQ can be considered as a distribution, we considered the Bhattacharya distance measure for comparing the similarity [6]. The descriptor obtained from each face will be compared with the descriptor of all the remaining faces. The most distant face from the current face is identified. This process is continued for all the faces and the majority voted distant face will be identified as the spliced face.

4 Experiments and Results

The proposed method is tested on images taken from DSO-I dataset in the tifs-database [3]. The tifs-database contains images in Portable Network Graphics (PNG) with a resolution of 2,048\(\,\times \,\)1536 pixels. In images with two faces, it is impossible to discriminate the original and a spliced facial region by analyzing facial regions alone. Hence, images with three or more faces are selected resulting in a subset of 55 images, where each image contains at most one spliced facial region.

Performance is evaluated using Precision, Recall and F-Score as defined in Eqs. 1, 2 and 3 respectively,

$$\begin{aligned} \text {Precision} = \frac{N_{TP}}{N_{TP} + N_{FP}} * 100 \end{aligned}$$
(1)
$$\begin{aligned} Recall = \frac{N_{TP}}{N_{TP} + N_{FN}} * 100 \end{aligned}$$
(2)
$$\begin{aligned} \text {F-Score} = 2 * \left( \frac{Precision * Recall}{Precision + Recall} \right) * 100 \end{aligned}$$
(3)

where N\(_\mathrm{TP}\) represents the number of spliced faces located correctly, N\(_\mathrm{FP}\) represents number of authentic faces detected as spliced, and N\(_\mathrm{FN}\) represents the number of spliced faces detected as authentic.

Table 1. Experimental results on DSO-I dataset

4.1 Experiments on DSO-I Dataset

For generating LPQ features, the source code provided by the authors [8] was used. The normalized LPQ feature histogram is computed with decorrelated LPQ with 3\(\,\times \,\)3 window size. Gaussian derivative quadrature filter pair is used to compute the local frequency. Finally, the similarity among faces are examined using the Bhattacharya histogram distance measure [6]. When tested on 55 images from DSO-I dataset containing a spliced face, an F-Score of 69.64% is obtained in GGW map and 63.16% is obtained in IIC map as shown in Table 1.

4.2 Experiments with Spatial Pyramid Decomposition

We also evaluated the effectiveness of the proposed method with spatial pyramid decomposition of 1\(\,\times \,\)3 and 2\(\,\times \,\)2 subdivisions. In 1\(\,\times \,\)3 spatial pyramid, the image is divided horizontally into three parts. Then, feature vector is extracted from each of the three parts and later combined to obtain the complete feature vector. In 2\(\,\times \,\)2 spatial pyramid decomposition, the image is divided into 2 horizontal and 2 vertical blocks giving a total of 4 image parts. Feature vectors are extracted from all the sub-blocks and are concatenated to get the complete feature vector. The weights for the feature vector extracted from each sub-part for 1\(\,\times \,\)3 and 2\(\,\times \,\)2 spatial blocks are 1/3 and 1/4 respectively. Both 1\(\,\times \,\)3 and 2\(\,\times \,\)2 spatial pyramid decomposition is shown in Fig. 3.

Fig. 3.
figure 3

The feature extraction steps involved in spatial pyramid decomposition.

When the whole image is considered without spatial pyramid decomposition we get a 256-bin LPQ descriptor representing the whole image. When 1\(\,\times \,\)3 spatial decomposition is considered, each of the horizontal sub-block generates 256-bin LPQ descriptor and the final feature vector is a concatenation of these 3 feature vectors, resulting in a 768-bin feature vector. Similarly, for the 2\(\,\times \,\)2 spatial decomposition, each of the 2\(\,\times \,\)2 blocks generates a 256-bin LPQ descriptor and the final feature vector is a concatenation of 4 feature vectors, resulting in a 1024-bin feature vector.

Table 2. Performance comparison of different spatial pyramid decomposition for IIC and GGW maps.

Table 2 shows the results obtained with 1\(\,\times \,\)3 and 2\(\,\times \,\)2 spatial decomposition along with the results obtained when the feature descriptor from the whole image is used. Both the spatial pyramid decomposition with 1\(\,\times \,\)3 sub-parts and 2\(\,\times \,\)2 sub-parts showed better results in GGW map than in IIC map. However, the results reveal that LPQ descriptor exhibited the best F-Score when the facial region is treated as a whole without spatial decomposition.

Table 3. Performance evaluation for upper, middle, and lower blocks in 1\(\,\times \,\)3 spatial pyramid decomposition is given here.

While checking 1\(\,\times \,\)3 spatial decomposition, we checked the effectiveness of comparing the facial region as a whole or as sub-parts such as the upper one-third, the middle region, and the lower one-third region. The results obtained for each region separately is shown in Table 3. The highest F-Score of 54.87% is obtained for the lower region from the IIC map. For GGW map, the highest F-Score of 60.72% is obtained for the middle region.

4.3 Comparison with Related Works

The F-Score of LPQ-based proposed method is compared with the existing non-machine learning approaches [13, 14] tested on the same set of test images. The performance comparison shown in Table 4 shows that LPQ based proposed method performs better than PCA-based technique [14] and brightness distribution based methods [13].

Table 4. The performance comparison with existing techniques on the dataset DSO-I.

5 Conclusion

The proposed spliced face detection technique is based on the texture differences between the spliced and original faces in the illumination representation. A comparison with the related works show that the proposed method performs better than the existing unsupervised approaches of spliced face detection. The method is useful during a digital crime investigation when the digital investigator has to figure out the spliced facial region in a suspect digital image. Another observation is that when analyzing facial regions for texture similarities from illuminant maps, it is good to process the image region as a whole without spatial pyramid decomposition. In the future, decision fusion techniques considering color descriptors along with texture descriptors need to be explored.