Keywords

1 Introduction

Image Forensics traditionally refers to a number of different tasks on digital images aiming at producing evidence on the authenticity and integrity of data (e.g., forgery detection) and on the identification of the acquisition device (camera identification) [1,2,3]. To solve the forgery detection task, some approaches stand above the others: a group of them looks at the structure of the file (e.g., JPEG blocking artifacts analysis [4, 5], hash functions [6], JPEG headers analysis [7], thumbnails [8] and EXIF analysis [9], etc.); others try to identify the device that acquired the image by making use of PRNU patterns [10, 11], or focus on statistical analysis of the DCT coefficients [12,13,14]. Some in-depth studies [15, 16] showed that it is possible to coarsely solve the camera identification task, using the DCT coefficients as a feature. Hence it is clear the importance of the JPEG pipeline in retrieving information about the history of an image. Nowadays Social Networks allow their users to upload and share large amounts of images: just on Facebook about 1 billion images are shared every day. What happens when a picture is shared on a social platform? How does the upload process affect the JPEG elements of the image? A Social Network is yet but another piece of software that alters images for bandwidth, storage and layout reasons. These kind of alterations, specifically scaling and re-compression, have been proved to make state-of-the-art approaches for camera identification less precise and reliable [17, 18]. Recent studies [19,20,21] have shown that, although the platform heavily modifies an image, this processing leaves a sort of fingerprint on the image itself. All those studies focus on the analysis of too few Social Networks and specific unrealistic scenarios making their works not general enough. In order to improve state of the art and to deeply understand how SNSs process images, a dataset of images from different camera devices was collected, under controlled conditions. We selected ten SNSs through which we processed the collected images by mean of an upload and download process. By doing this, a dataset of images has been obtained, in order to identify any alterations on JPEG elements. The main discovery of our study was that alterations observed are platform dependent (server-side) but also related to the application carrying out the upload (client-side). This evidence can be fundamental for investigation purposes to understand not only the provenience of an image, but also if it has been uploaded from a given device (e.g., Android, iOS). All the observed alterations allowed to build an automatic classifier, based on two K-NN classifiers and a decision tree fitted on the built dataset. Starting from an input image, the proposed approach can predict the SNS that processed the image and the client application through which the image has been uploaded. The remainder of the paper is structured as follows: in Sect. 2, we describe how the dataset has been built, which social platforms have been considered and what kind of upload methods have been used; in Sect. 3, an in-depth analysis on dataset images is reported in order to find alterations that can be coded into a fingerprint for a SNS processing; in Sect. 4, our approach for image ballistics on social image data is presented with the obtained classification results. Finally, conclusions and reasoning about possible future works on the topic are discussed.

2 A Dataset of Social Imagery

The alterations introduced on images by SNS can be thought as a unique fingerprint left by the SNS. The aim of our study is to discover those fingerprints by analyzing the behavior of the most popular SNSs that allow image sharing. Hence, 10 platforms have been selected. First of all, Facebook (www.facebook.com) and Google+ (http://plus.google.com) were taken into account as being the two most popular platforms where users can share their statuses and multimedia content to a network of friends. Twitter (http://www.twitter.com) and Tumblr (http://www.tumblr.com) were considered as being representative of the micro-blogging concept. We included also Flickr (https://www.flickr.com) and Instagram (https://www.instagram.com) as platforms focused on sharing high quality artistic photos with capabilities of image editing and filtering. Imgur (http://www.imgur.com) and Tinypic (http://www.tinypic.com) were also taken into consideration even if they are not properly SNSs but are very popular platforms for image sharing: users usually link images hosted on them from forums and web sites all over the Internet. Finally WhatsApp and Telegram were also selected as being the two most popular mobile messaging platforms that, by allowing users to create chat groups, are another big place for image sharing on the Internet. Specifically, the last two services are often involved in forensic investigations. To discover how SNSs process images, we collected a set of photos with the camera devices listed in Table 1. Images were acquired representing three different types of scenes: outdoor scenes with buildings (artificial environment), outdoor scenes without buildings (natural environment) and indoor scenes. When taking a picture, we captured two versions: a High Quality (HQ) photo at the maximum resolution allowed by the device, and a Low Quality (LQ) photo (see also Table 1). Capturing images in this way, a dataset with a good variability in terms of contents and resolutions was obtained. Images collected so far were uploaded to each of the considered platforms with two different methods: with a web browser, and with iOS and Android native apps. No further discrimination is needed for web browsers because we observed that alterations are not browser-dependent. Each download was performed by searching for the image file URL in the HTML code of the page showing the image itself. At the end of this phase 2400 images were properly collected. The second upload method was carried out with iOS and Android native apps of each social platform, except for Tinypic and Imgur that do not possess an official app in stores. Moreover, the upload has been done by choosing images in two ways: by searching in the gallery for a previously acquired image (images from local gallery) and by acquiring the image with the camera app embedded in the app itself (embedded camera app). After uploading all images as described above, all of them were downloaded through the “URL searching technique” previously described. 320 more images processed through 8 platforms were thus obtained. All uploads were performed with default settings. The overall dataset consists of 2720 images in JPEG format and it is available at the following web address http://iplab.dmi.unict.it/DigitalForensics/social_image_forensics/.

3 Dataset Analysis

The main aim of our work is to find a fingerprint left by SNSs on JPEG structure elements, after an upload/download process, in order to build a classifier for image ballistics. To achieve this goal, all information contained in the JPEG file specification has been analyzed: image filename, image size, meta-data and JPEG compression information. We observed that each upload/download process through the considered SNSs produces different alterations among the above-mentioned elements that could be taken into account as fingerprints of the process itself. Details of these alterations will be described in the following Subsections.

3.1 Image Filename Alterations

The analysis of the filename of an image and the comparison with known patterns during an investigation on storage devices can provide information about the platform from which it could be downloaded and the date when it was uploaded. For this reason, we first evaluated if and how each platform modifies the file name. We observed that all platforms except Google+ do a rename.

Table 1. Devices used to carry out image collection. For each device the corresponding Low Quality (LQ) and High Quality (HQ) resolutions are reported.
Table 2. Renaming scheme for an uploaded image with original filename IMG_2641.jpg. The new file name for each platform is reported (Image IDs are marked in bold).
Table 3. Alterations on JPEG files. The EXIF column reports how JPEG meta-data are edited: maintained, modified or deleted. The File Size column reports if a resize is applied and the corresponding conditions. The JPEG compression column reports if a new JPEG compression is carried out and the corresponding conditions (if any).

As an example, in Table 2 the new names for an uploaded file with name “IMG_2641.jpg” are reported. The column “image lookup” describes the presence into the new filename of an ID useful to reconstruct an URL that points to the web location where the image file is stored.

3.2 Image Size Alterations

A stronger evidence than file naming is the resize of the uploaded images on each platform. A fine-grained test was performed by using synthetic images derived from our dataset and resized at different scales.

On most platforms, resizing is applied if and only if the input image matches certain conditions. This condition is linked to the length in pixels of the longest side M of the original image, where \(M = max(width, height)\). If M is greater than a threshold, a resizing algorithm is applied and the resulting image has its longest size equal to the threshold. In Table 3, such conditions and the corresponding thresholds for each platform are reported. Tumblr does not rescale uploaded images, while in Flickr the threshold is set by the user. When the images are resized, the longest side will be set to a fixed value that identifies, in some sense, the platform that made the operation (see Table 3).

3.3 Meta-Data Alterations

The best evidence to obtain information, for investigation purposes, are meta-data embedded in JPEG files. These meta-data are technically known as EXIF and can store information like the device that acquired an image, the date and time of acquisition and also the GPS coordinates. For our purposes, we divided EXIF data into two categories: “camera data” which contains all those key-valued that allow to identifying the device that acquired the image and “other data” for every other EXIF information.

In Table 3, the results of the analysis on EXIF data are resumed for each platform. In particular, it is reported if “camera data” and “other data” are deleted, maintained or just edited throughout the processing. Unfortunately, most of the SNSs delete all meta-data, specifically those related to camera data.

3.4 Image JPEG Compression Alterations

The images considered in our dataset are all encoded in JPEG format, both the original versions and the downloaded ones. Thus, an analysis on how the SNS processing affects the JPEG compression has been carried out. We focused on the Discrete Quantization Tables (DQTs) used for JPEG compression (extracted by DJPEG: an open source tool part of libjpeg project [22]).

Considering how platforms affect DQTs, it is possible to divide them into two categories:

  • Platforms that always re-compress images (Facebook, Twitter, Telegram, WhatsApp, Instagram);

  • Platforms that re-compress images at a given condition (Google+, Tumblr, Tinypic, Imgur).

The compression follows the same rules we described for resizing. In fact, a threshold-based evaluation is performed on the longest image side and, if it is bigger than the threshold, the image is compressed using a DQT that will be different from the original one. This is not true for all the considered platforms; Flickr allows the user to choose the threshold (if any), while on Imgur the threshold is fixed in terms of size in MegaBytes; specifically, if the input image size is greater than 5.45 MB, than the re-compression is performed, otherwise nothing happens (see also Table 3).

4 Image Ballistics of Social Data

Starting from the results of the analysis reported in previous Sections, regarding the alterations on JPEG elements of processed images, it is possible to assess that such alterations bring pieces of information about the history of the image but they could be insufficient, if considered alone, for investigation purposes. Hence, we encoded all the observed alterations into a set of features to be used as input for an automatic classifier. The following elements are then embedded into proper numerical features:

  • The DQTs coefficients divided in 64 coefficients for the Chrominance table and 64 for the Luminance one, which represent the JPEG compression alterations. These coefficients were investigated separately with PCA and we obtained an explained variance of 99% for the first 32 coefficients of the luminance table and the first 8 coefficient of the chrominance one;

  • Image size (width and height in pixels), which brings information about size alterations;

  • Number and typology of EXIF data (key-value couples), which describes meta-data alterations (both camera and other data);

  • Number of markers in JPEG files as defined in [23].

PRNU was not taken into consideration among our features, because, as already mentioned, the heavy processing done on images by SNSs degrades PRNU approaches for camera identification in terms of accuracy [17].

4.1 Implementing Image Ballistics: A Classification Engine

Given a JPEG image I, our objectives are to define:

  1. 1.

    if there is a compatibility between the non-related JPEG elements of I (i.e. filename, EXIF data) and the processing pipeline of SNSs;

  2. 2.

    if there is a compatibility between the JPEG elements of I and the processing pipeline of SNSs;

  3. 3.

    which SNS is compatible with the JPEG elements of the image, with a certain degree of confidence, and what is the uploading source in terms of operating system (OS) and application.

We represent each image I as a 44-dimensional vector

$$\begin{aligned} \varvec{v} = \{w,h,|E|,m,l_j,c_k\}, \end{aligned}$$
(1)

where

  • \(w \times h\) is the size in pixels of I;

  • \(E = \left\{ key, value \right\} \) is an associative array containing the EXIF metadata, thus |E| is the number of metadata found in the structure of I;

  • m is the number of JPEG markers in I;

  • \(l_j, j=0,\ldots ,31\) are the first 32 coefficients of the luminance quantization table;

  • \(c_k, k=0,\ldots ,7\) are the first 8 coefficients of the chrominance quantization table.

Moreover, we define \(fn\left( I\right) \) as the filename of the image I.

At the first stage, we consider \(fn\left( I\right) \) and E. If there is a matching between \(fn\left( I\right) \) and the renaming patterns observed in Sect. 3.1, our approach confirms the compatibility between I and the SNS with the matched pattern. Also, E is taken into account, looking for the “Exif.Image.UniqueCameraModel” key. If it is set, then our system returns that value.

Thus, the whole dataset representation is

$$\begin{aligned} \varvec{V} = \left\{ \varvec{v}_1, \ldots , \varvec{v}_N \right\} \end{aligned}$$

where N is the total number of images. In order to train the SNS and Upload Scenario classifiers, we augment this representation with the corresponding labels. Thus, the final representation for a generic image \(I_i\) is

$$\begin{aligned} \varvec{I}_i = \left\{ \varvec{v}_i, sns_i, uc_i, sm_i \right\} \end{aligned}$$

where \(sns_i\) is the SNS, \(uc_i\) is the client application and \(sm_i\) is the image selection method.

Our classifier performs a two-steps analysis. First, we implement an Anomaly Detector to exclude the images that have not been processed by SNSs, then we run in parallel a K-NN Classifier and a Decision Tree [24] to asses respectively the SNS of origin and the uploading scenario (OS + application).

Given the representations \(\varvec{v}_{I_1}\) of an image \(I_1\) and \(\varvec{v}_{I_2}\) of an image \(I_2\), we define the cosine distance between \(\varvec{v}_{I_1}\) and \(\varvec{v}_{I_2}\)

$$\begin{aligned} d(\varvec{v_1},\varvec{v_2}) = \frac{\varvec{v_1} \cdot \varvec{v_2}}{|\varvec{v_1}| |\varvec{v_2}|} \end{aligned}$$
(2)

as a measure of similarity between \(I_1\) and \(I_2\). Therefore, it is possible to build a distance matrix \(\varvec{D}\) of size \(N \times N\) where the element \(d_{ij}\) is equal to the distance between the images \(I_i\) and \(I_j\). We will refer to the \(r-\)th row of this matrix as \(\varvec{D}_r\) and to the \(c-\)th column as \(\varvec{D}^c\). It is important to note that \(\forall \; I_i,I_j,\;0 \le d(\varvec{v_i},\varvec{v_j}) \le 1\), and specifically, the more is the similarity, the more the distance will be closer to 1. Exploiting this property, we define the Anomaly Detector as

$$\begin{aligned} a\left( \varvec{v}_i, \varvec{D} \right) = \left\{ \begin{array}{ll} \left( \varvec{v}_i,i\right) &{} if\;\sum \limits _{j=1}^K d_{ij} < T \\ not processed &{} otherwise \end{array} \right. \end{aligned}$$
(3)

where \(T \in [0,K]\) is defined as the Anomaly Threshold. In other words, since the more two images are similar, the more their distance will be closer to 1, we make sure that at least \(\lfloor K\rfloor \) samples in our dataset are similar to the query image representation. Then, when \(a\left( \varvec{v}_i, \varvec{D} \right) = 0\), the representation is far apart the samples, and we can state that probably the image has not been processed.

The output of a is then used as input by K-NN (4) and Decision Tree algorithms [24].

$$\begin{aligned} knn\left( \varvec{v}_i, i\right) = sns_j \; | \; d_{ij} = \min \varvec{D}^i \end{aligned}$$
(4)
$$\begin{aligned} dt\left( \varvec{v}_i, i \right) = (uc_j,sm_j) \end{aligned}$$
(5)

where \(uc_j\) and \(sm_j\) are the leaves obtained following the path with \(\varvec{v}_i\) as input. Hence, the classification scheme, shown in Fig. 1, can be formalized as follows

$$\begin{aligned} C(\varvec{v}_i,\varvec{D}) = knn\left( a\left( \varvec{v}_i, \varvec{D} \right) \right) \oplus dt\left( a\left( \varvec{v}_i, \varvec{D} \right) \right) \end{aligned}$$
(6)

K-NN algorithm looks for the closest sample in the dataset, and assigns the same SNS to the query image. A Decision tree (Eq. 5) builds classification in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes. The algorithm used for building the decision tree is the ID3 [24] which employs a top-down, greedy search through the space of possible branches with no backtracking. ID3 uses Entropy to construct a decision tree by evaluating \(\varvec{v} \in \varvec{V}\).

Finally, the output of the K-NN Classifier \(sns_j\) is processed through a SNS Consistency Test. Let be \(S = \{sns_1,\ldots ,sns_n\}\) the set of SNSs that operates a re-compression at the condition \(max(w,h) > C_{sns_i}\) where \(C_{sns_i}\) is the conditional threshold for the \(i-\)th SNS and w and h as listed in Table 3.

Given that \(sns_j \in S\), if \(max(w,h) < C_{sns_j}\) it is an anomaly. The test is then repeated for the next most probable prediction from the SNS Classifier until the corresponding condition is satisfied or the loop stalls on the same SNS prediction. In this last case, the result of the classification is “not sure”; otherwise, a SNS prediction is reached and outputted (\(sns_j\)) with the predicted upload client application (\(uc_j\)) and image selection method (\(sm_o\)).

Figure 1 shows a schematic representation of the proposed approach.

Fig. 1.
figure 1

Classification scheme for Image Ballistics in the era of Social Network Services. The proposed approach encodes JPEG information from an input image into a feature vector that is then processed through machine learning techniques in order to predict the most probable SNS from which the input image was downloaded and the correspondent upload method.

4.2 Classification Results

In this Section, validation results for the proposed approach are reported to demonstrate its goodness. The anomaly detector was validated by taking from our dataset 240 random images that suffered alterations, and 240 images that did not pass through any alteration. The anomaly detector achieved the best error rate, equal to 3.37%, with K = 3 and T = 2.90. The entire approach for image ballistics described in the previous Section was then tested through a 5-fold cross validation test. Best Ks and T were found through grid-search hyper-parameter tuning method. In Fig. 2, confusion matrices reporting the average value through the 5 runs are shown.

Fig. 2.
figure 2

Confusion Matrices obtained from 5-cross validation on our dataset. The reported values, are the average accuracy values (%) in 5 runs of cross validation test. (a) Confusion Matrix for Social platform Classification, (b) Confusion Matrix for upload method classification.

The accuracy obtained for the SNS classification task was 96% with best K equal to 3 while the accuracy value for the upload client classification task was 97.69% with an accuracy of 91% for the prediction of image selection method, given iOS or Android native app as prior.

Different approaches with other classifiers (like linear and non linear SVM) or combination of classifiers (like hierarchical or cascade approaches) were also tested, but the overall results were slightly worse. The classification scheme reported in Fig. 1 was the best approach we obtained throughout our tests.

In our experiments, we observed that, as happens for different camera devices of the same model [16], different images, from the same platform, have slightly differences in DQT coefficients. This demonstrated the effectiveness of K-NN over other methods for giving to the approach the resilience against little differences while detecting the most-similar SNS fingerprint. We also built a new test set composed of 20 images randomly downloaded from each considered SNS on which we achieved an accuracy in SNS prediction of 94% that is quite similar to the validation results.

Another consideration is needed about the SNSs fingerprints described in this work and regarding the fact that all the alterations observed can change according to software development and releases. For these reasons, the proposed approach is justified for being able to readapt through time, just by updating the reference dataset.

5 Conclusions and Future Works

In this work, we presented a dataset for image ballistic and proposed a classification engine to discover if an image has been processed by a Social Network Service and, if the answer is positive, by which SNS among the 10 considered platforms. The proposed approach performed the task of Image Ballistics with good accuracy by predicting the SNS that process an image and the corresponding upload method, with an accuracy respectively of 96% and 97.69%.

We think that this work can open new perspectives on the field of Image Forensics: the approach can be upgraded by considering other formats (e.g., PNG) and new features related to image contents.