1 Introduction

The growing popularity of Internet and social media has enabled ubiquitous and distributed sharing of digital photos among Internet users. This has been accompanied by new possibilities to easily copy, alter and distribute digital content to a large number of recipients thanks to the availability and accessibility of image processing software [24]. A huge challenge, therefore, arises for the ability of tracking and monitoring the evolution of original photos in the Web, in order to enforce intellectual property and copyright protection [5], for example. Tracking and visualizing copies can also be useful for analyzing and exploring the dissemination and use of photos in Web communities (e.g. arts, journalism) [9, 20, 22, 56]. Although several methods have been proposed in the past for image copy detection [5, 24, 29] and image collection visualization [35, 50, 51], not much research have been conducted for the purpose of tracing and/or visualizing the order of multiple image copies.

Early approaches for copy detection are based on watermarking which consists of embedding signatures (or watermarks) for copyright protection [13]. Thus, detecting copies amounts to identifying the watermark encoded in the image. However, these methods can be vulnerable since watermarks can be removed or altered via postprocessing techniques [25, 40]. Recently, content-based copy detection (CBCD) has been proposed as an alternative to watermarking for detecting (illegal) image copies [16, 29]. The goal of CBCD is to determine, using only the image content, whether near-replicas of a given image exist in the Web or through an unauthorized third party [5, 45, 52]. To detect possible copies, CBCD systems (e.g., TinEye and Piximilar [45]) extract image low-level features, followed by similarity measurement between images.

Copy detection systems are generally good at finding similarities between images and identifying identical content. However, they do not tell much about possible transformations operated on an image. Indeed, there is a large number of image manipulations, ranging from simple geometric/photometric transforms or resizing to more complex transforms, such as image editing, cropping, stitching and compression [41]. Moreover, even in the case of simple geometric transformations, for example, algorithms can fail to recover the exact transformation parameters when the image is significantly altered [44]. Recently, methods have been proposed for detecting and localizing specific image forgeries such as image splicing (IS) [10] and copy-move (CM) [12]. However, they are not able to recover the order of multiple image copies which may contain different types of modifications. Although having a unique algorithm to detect all types and order of transformations seems a very complex pursuit, one can make the problem more tractable by making assumptions about the generative process of copies. For example, grouping transformations into specific categories (e.g., photometric, geometric) [16] can facilitate investigation for searching potential manipulations operated on each copy.

In this paper, we propose a method for constructing an evolution graph for a set of image copies derived from the same original or a set of reference images determined a priori. Our method infers the most likely transformations used to produce the remaining image copies. For simplicity, we focus on mainly three types of image transformations in copy production: geometric, photometric, and image editing. Geometric transformations refer to changes due to affine transforms and image distortions [41]. Photometric transformations refer to image color enhancement, filtering and color-to-gray transforms [21]. Finally, editing refers to basic operations such as image cropping, copy-move, zooming, seam carving and text/object insertion [17]. The identification of relations between images is based on the following steps:

  • In the first step, we detect simple transformations such as image cropping, resizing, rotation, small illumination changes and color-to-gray. These transformations enable to build the first (strong) edges in our copy evolution graph.

  • In the second step, we infer the remaining relations between images by detecting for each copy its most likely lineage in the graph. This step is performed by first building copy groups through agglomerative clustering and then using a combined local-global similarity measure to analyze image changes.

  • Finally, all group subgraphs are linked to the main graph and each formed edge is annotated with its inferred transformation(s). Experiments conducted to validate to proposed approach have shown its performance to produce interpretable and meaningful graphs compared to manually constructed ones.

The remainder of this paper is organized as follows. Section 3 describes the proposed approach for identifying image transformations and evolution graph visualization of copies. Section 4 presents experimental results validating our approach. We end the paper with a conclusion and future work perspectives.

2 Related work

Since our work deals with detecting manipulations in image copies as well as copy ordering, it is important to give an overview of related work dealing with copy/forgery detection. Watermarking [31] and content-based copy detection (CBCD) [29] are the main methods used for searching image copies. For forgery detection, several methods have been proposed for specific image forgeries such as copy-move [12], image splicing/composites [5], image compression, and basic image processing operations such as illumination changes, rotation and cropping [16].

2.1 Image copy detection methods

Detecting image copies is important for several applications such as usage tracking and copyright protection. There are broadly two approaches for copy detection 1) watermarking and 2) content-based copy detection [28, 29]. Watermarking consists of embedding signatures (or watermarks) in images in the form of identification codes carrying information about a copyright owner. Watermarks can be inserted in the spatial, frequency and wavelet domains [13]. Therefore, detecting copies amounts to identifying the watermark encoded in images. The main limitation of these methods is their vulnerability to watermark removal or alteration using image processing techniques [25, 40]. Watermarking poses also an additional constraint since it must be inserted in the original image before its publication, sometimes at the time of its recording [16]. Content-based copy detection (CBCD) is a complementary approach to watermarking for detecting illegal image copies [16, 29]. It is based on the same principles as content-based image retrieval (CBIR) where systems extract signatures from an original image and each image in a large database, which are compared to retrieve near-replicas [45, 52]. Image features capture visual properties of an image, either globally for the entire image (e.g., wavelets, discrete cosine transform (DCT), color, texture) or locally for a small group of pixels (e.g., shapes, contours, salient points). Global features give the image layout and the distribution of color/texture patterns. The overall image is thus represented by a vector of color/texture components such as wavelet distributions [1] and color histograms [2, 32]. Global features also generally fast similarity computation [1]. However, they may fail to identify important local visual characteristics, which can lead to a large number of false postives/negatives. Local features can be extracted at salient points [3, 58] or non-overlapping blocks and stacked into vectors [24, 29, 54]. Thus, depending on the scale of the key content or pattern, an appropriate representation should be chosen [14]. For example, [58] proposed a variant of the scale-invariant feature transform (SIFT) for fast copy/object detection in the presence of flipping transformations. DCT features have been used in [24, 29] to train classifiers for detecting copies produced by geometric manipulations and compression.

2.2 Image forgery detection methods

Image forgery detection aims at assessing the authenticity of a digital image by detecting and localizing potential manipulations. Following [16], forgery detection techniques can be broadly grouped into two main categories: 1) statistical-based techniques: study anomalies introduced at the pixel level via global transformations and lossy compression schemes, 2) physically-based techniques: detect anomalies by using models describing interaction between physical objects, light, and the camera. Models can also use measurements of objects in the world and their positions relative to the camera and account for artifacts introduced by the camera lens.

One of the main studied problems in the literature of image forgery detection is copy-move (CM) manipulation [59]. CM forgery consists of copying and pasting content within the same image, and potentially postprocessing it [4, 19]. It is usually done in order to hide certain details or to duplicate certain aspects of an image [5]. Proposed CM detection methods use either keypoint-based or or block-based matching techniques [12]. Keypoint-based methods extract image features at salient points such as SIFT [3, 39], which are then compared to the rest of the image to detect potential CM manipulations. Block-based methods subdivide an image into rectangular regions on which features such as color/intensity distribution [33], invariant moments [34], wavelet transform [36] or DCT coefficients [48, 54] are calculated. Most of these features require preprocessing steps (e.g., noise removal, converting color to intensity levels) to be used in the matching step.

Like CM forgery, image splicing (IS) is another simple and commonly used image tampering scheme in images [5, 16]. It consists of replacing image fragments using information from other images. Since CM and IS forgeries are similar, their detection can be addressed using the same tools. Authors in [37] developed an IS detection model based on the bi-coherence magnitude and phase of the Fourier transform. In [10], camera response functions (CRF) are analyzed for efficient detection and localization of CM and IS manipulations. In [57], planar homography constraint is used to roughly localize tampered regions in images, followed by graph-cut segmentation for extracting fake objects. Worth mentioning are also methods proposed for automatic change detection between multi-temporal images of the same scene [27, 42]. The majority of these methods have been proposed for specific domains such as video surveillance [8, 38], remote sensing [23, 26] and medical diagnosis [6].

Most of the presented methods deal specifically with copy and/or forgery detection problems in single images. However, they can not be used for identifying relationships between multiple copies. Indeed, having an insight of the type and order of creation of copies can be useful for several high-level applications such as tracking copies for copyright protection [13], analyzing information dissemination and diffusion on the Web [20] and forecasting cascades of photo usage in Web communities [11, 30]. Our method addresses this problem by proposing an algorithm for estimating the lineage of multiple image copies and their order of creation. It recovers also the most likely transformations used for their generation. Finally, a visualization graph is constructed to give a visual insight about the evolution graph of the original image into its copies.

3 The proposed approach

Suppose that we have a set of n images \(\mathcal {I} =\{I_{1}, ..., I_{n}\}\) representing different copies of an original image I0. Our goal is to recover the order of creation of the copies in the form of a tree graph depicting the history of copy creation from I0 to its descendent(s) and the lineage of each copy. Since an image can undergo arbitrary transformations, some of which are irreversible in their nature (e.g., image cropping, editing), it is hard to assess with certainty all types of transformations between two images in \(\mathcal {I}\). However, an approximation of these transformations is still possible through a reverse-engineering-like approach using similarity measures between images. This will help understanding the generative process of related image copies and unravelling the most likely used transformations.

To facilitate problem formulation, and without loss of generality, we suppose that we have the following groups of transformations: C: image cropping, E: edition, G: color to gray, L: illumination change, R: rotation, S: scale change. To express that a copy Ij of an image Ii is generated using one of the transformations T ∈{C,E,G,R,S,L}, we adopt the following notation: Ij = T(Ii,ϕ), where ϕ is the set of parameters used in transformation T. In case of an image rotation, for example, we have ϕ = {𝜃} where 𝜃 is the rotation angle. Note that when there are multiple transformation candidates for explaining the creation of a particular copy, we use the principle of least action by selecting the simplest transformation among the candidates. For example, if we have two possible rotations to produce image IC from images IA and IB, respectively, such that IC = R(IA,𝜃1) = R(IB,𝜃2) and 𝜃1 < 𝜃2, then we will suppose that IC is produced by the rotation with the smallest angle, that is \(I_{A} \rightarrow I_{C}\). Finally, since we can have several types of editions that can be carried out either at a local or global level, we use a subscript to indicate each type of edition that can be detected by our method:

  • Global editions: refer to transformations affecting globally the visual aspect of the image, either by photometric (e.g., compression, noise, blurring, etc.) or geometric manipulations (e.g. image distortion, perspective transformation, etc.). Since it is difficult to discriminate between these types of editions, we denote them simply by the symbol EG.

  • Local editions: refer to transformations affecting locally the visual aspect of the image (e.g., text insertion/removal, object insertion/removal, border insertion, face blurring, etc). Our algorithm can detect three particular types of local editions, which are text, border and object insertion/removal, denoted by the symbols ET, EB and EO, respectively.

3.1 Estimating strong relations between images

Our method starts by recovering relations with the strongest evidence among the images, and then completing the graph by inferring the rest of the relations. One of the strongest relations that can be estimated with high confidence among images are: cropping (C), color to gray (G), rotations (R), illumination change (L) and scale changes (S). Indeed, these transformations do not produce major modifications in the image content which makes them relatively easy to detect through correlation analysis. More formally, we measure the normalized cross-correlation (NCC) [43] between all pairs of images. Given two gray-scale images of equal size, IA and IB, the NCC is given by the formula:

$$\begin{array}{@{}rcl@{}} NCC(I_{A},I_{B}) = \frac{{\sum}_{x,y} I_{A}(x,y)I_{B}(x,y)}{ \sqrt{{\sum}_{x,y} I_{A}(x,y)^{2} {\sum}_{x,y} I_{B}(x,y)^{2}}}, \end{array} $$
(1)

where the summations are made over all the image coordinates. Note that NCC(IA,IB) ∈ [0,1], where NCC(IA,IB) = 1 if IA and IB are perfectly identical, and NCC(IA,IB) ≪ 1, otherwise. Given two arbitrary images IA and IB in our set \(\mathcal {I}\), with sizes let HA × WA and HB × WB, respectively, the following relations can be asserted through NCC calculation:

  • We consider IB as a cropping of IA, denoted by IB = C(IA,ϕ) with ϕ = {x,y,HB,WB}, if there exist a sub-image \(I^{\prime }_{A}\) of size HB × WB centered at location (x,y) ∈ IA where \(NCC(I^{\prime }_{A},I_{B}) = 1\).

  • We consider IA as a rotation of IB if there exist an angle 𝜃 where NCC(R(IA,𝜃),IB) = 1, where R(IA,𝜃) stands for a rotation of image IA with an angle 𝜃. For simplicity, we take the following angle values: \(\theta =\frac { \delta \pi }{4}\) where δ ∈{1,...,8}.

  • A scale change occurs between image IA and IB if there exists a factor \(s \in \mathbb {R}^{+}\) with \(NCC(S(I_{A},s), I_{B})\simeq 1\), where S(IA,s) is a scale change of image IA with factor s > 0.

  • We consider IB as a grayscale version of IA if NCC(G(IA),IB) = 1, where G(IA) is a grayscale transformation of image IA obtained by averaging its RGB color channels.

  • We consider IB as a result of illumination change of of IA, denoted by IB = L(IA,γ) with γ is the parameter of the Gamma transform, if NCC(IA,IB) = 1 and there exist a parameter γ such that the histograms of L(IA,γ) and IB are identical. Note that the negative of an image belongs also to this category of transformation.

To detect the above transformations, our algorithm tests each pair of images. We first test the occurrence of cropping at each image location using 8 values of rotation angles 𝜃 and 5 scales s, making up 40 tests at each location. To ensure fast calculation of the NCC, we use recursion as proposed in [46]. The NCC allows also to detect color-to-gray transform if one of the compared gray-scale images is originally in color. Finally, small illumination changes are detected by testing 5 values of γ. We then take one of the following decisions between each pair of images (IA,IB):

  • A strong relation T is found if NCC(T(IA),IB) ∈ [δ2,1],

  • A weak relation T is found if NCC(T(IA),IB) ∈ [δ1,δ2],

  • No relation is found, otherwise,

where δ1,δ2 ∈ [0,1] are two thresholds such that δ1 < δ2 (typically δ2 = 0.99 and δ1 = 0.90). Note that a strong relation implies automatically adding a strong edge to the evolution graph, but not a weak relation. Fig. 1 shows an example of strong relation detection, where image (b) results from a cropping of image (a) followed by a scale change. Images (c) and (d) show, respectively, the NCC map and the cropping location corresponding to the highest correlation value.

Fig. 1
figure 1

Cropping detection using NCC. a represents the original image IA, b represents a cropping of image IA. c represents the value of NCC at different locations in IA and d represents a highlight of the location of the cropping in image IA

3.2 Image grouping for copy lineage detection

Once the strongest relations are detected in \(\mathcal {I}\), we perform an agglomerative clustering on the images in \(\mathcal {I}\). In this phase, two scenarios can be considered. In the first one, no reference images are available and a fully unsupervised grouping is then performed. In the second one, some reference images can be provided by an expert or found in official sites (e.g., museums, archives, art galleries). Each reference image is then used as centroid on which a cluster is built to constitute lineages in the final graph having reference image as a root.

To group images into lineage clusters, we use image similarity based on a combination of strong relations detection and histogram comparison. Given two images IA and IB, we use the Bhattacharyya distance to measure the similarity between their histograms. Let \({H^{j}_{A}}\) and \({H^{j}_{B}}\) be the histograms of images IA and IBat color channel j ∈{R,G,B}, and Nbins is the number of bins in the histograms. The histogram similarity between images IA and IB is given by:

$$ S_{H}(I_{A},I_{B}) = \sum\limits_{j \in \{R,G,B\}} \frac{B({H^{j}_{A}},{H^{j}_{B}})} {3} $$
(2)

where \(B({H^{j}_{A}},{H^{j}_{B}}) = {\sum }_{i = 1}^{N_{bins}}{\sqrt {{H^{j}_{A}}(i) \cdot {H^{j}_{B}}(i)}}\). The clustering is performed according to the following similarity measurement and the single link method to calculate distance between intermediary groups for cluster fusion [15]. Finally, distance between two images is taken as follows:

$$\begin{array}{@{}rcl@{}} d(I_{A},I_{B}) \left\{ \begin{array}{cl} 0 & \text{if} \ \exists \ \textrm{strong relation}: I_{A} \rightarrow I_{B} \\ 1-S_{H}(I_{A},I_{B}) & \text{otherwise} \end{array} \right. \end{array} $$
(3)

where \(I_{A} \rightarrow I_{B} \) means that a strong precedence relation has been already established between IA and IB as described in the previous section. The first condition in (3) is intended to prevent clustering errors related to effects of transformations such image cropping, where the histogram of the cropped image can be significantly different from the original image. Fig. 2 shows an example of clustering for images in our third data set using the proposed method. Clearly, images showing similarities of their content have been grouped into homogenous clusters that will constitute different lineages in the final graph.

Fig. 2
figure 2

Illustration of image grouping by histogram comparison: a copy image set, b obtained copy groups constituting lineage clusters

3.3 Detecting image editions

Image editing refers to modifying a digital image by removing unwanted elements (e.g., scratches, face blurring, etc.), or inserting/removing elements such as objects and text. Local-level editing affects the image locally (e.g., text/object insertion/removal, scrach/red eye removal, border insertion, etc.). Global-level editing refers to transformations affecting all the pixels of the image (e.g., image distortion, compression, etc.). To discriminate between the different types of editions, we use appropriate features extracted from the image, followed by supervised classification.

Let \(I_{A},I_{B} \in \mathcal {I}\), where IB is the result of editing IA. To compare the two images IA and IB at a local level, we first align them using [44]. Then, a change map B is calculated for IA by comparing its local structure with IB using a local image similarity metric inspired from [47]:

$$\begin{array}{@{}rcl@{}} B(\mathbf{x}) = 1- \max_{\mathbf{x}^{\prime} \in N(\mathbf{x})} \left( \frac{[2\mu_{A}(\mathbf{x}) \mu_{B}(\mathbf{x}^{\prime})+c_{1}][2\sigma_{AB}(\mathbf{x},\mathbf{x}^{\prime})+c_{2}]}{[{\mu^{2}_{A}}(\mathbf{x}) {\mu^{2}_{B}}(\mathbf{x}^{\prime})+c_{1}][{\sigma^{2}_{A}}(\mathbf{x}) {\sigma^{2}_{B}}(\mathbf{x}^{\prime})+c_{2}]}\right) \end{array} $$
(4)

where N(x) stands for the neighbourhood of location x = (x,y). The parameters (μA(x),σA(x)) and \((\mu _{B}(\mathbf {x}^{\prime }),\sigma _{B}(\mathbf {x}^{\prime }))\) represent local mean and standard deviation at locations x and \(\mathbf {x}^{\prime }\) in images IA and IB, respectively. The constants c1 and c2 are used to stabilize the division with a weak denominator. Finally, the maximum operator around a location neighborhood is introduced to make the measure more robust to misalignments and slight image shifts.

We generate a map using (4) as illustrated in Fig. 3c, which is segmented to a binary image using [7] (see Fig. 3d). In the case that no blob is found, we consider that the image is edition-free. Otherwise, we extract the most important blobs from the binary map and analyze four of their geometric properties: Compactness, relative size, dispersion and precence/absence of text. Compactness measures the solidity of each detected blob. Local editions such text and object insertion/removal generally produce blobs with higher compactness than global manipulations such as distortions and compression. Relative size and dispersion indicate if detected blobs are local or scattered around the whole image. Based on the these features, we train an SVM classifier to detect wether a generated map stems from local or global edition. When local edition is detected, a second round of classification starts by analyzing first the occurrence of text in each blob using the method in [49]. If not text is detected, another SVM classifier is trained on the geometrical features of each blob to classify it as either an object EO or border EB insertion/removal. To train our classifiers, we use object examples from the Caltech dataset [18]. For examples of blobs occurring in global editions and border insertions, we collected images from the Web and added manually-generated onesFootnote 1.

Fig. 3
figure 3

Detection of image editions by comparing local image structure: a original image, b edited image, c change map generated using (4), d binary segmentation of c and e Red/blue rectangles indicate the presence/absence of text using the method proposed [49] on each detected blob

Figure 3 shows two examples of image editions. The first row shows local text insertions detected on some blobs of the change map using [49]. We can see in column (e) the regions detected as text insertions (red rectangles) and others detected falsely as non-text editions (blue rectangles). In the second row, we show an example of image distortion that generated non-compact and scattered blobs which have been classified as a global edition by our method.

3.4 Copy ordering and evolution graph construction

In this step, we build an evolution graph \(\mathcal {G}=(\mathcal {E},\mathcal {V})\) for our image set \(\mathcal {I}\) with vertices \(\mathcal {V}\) representing images in \(\mathcal {I}\) and oriented edges \(\mathcal {E}\) representing transformations performed on images to create copies (graph descendants). Given the resulting clustering groups, a sub-graph is constructed within each group by incrementally ordering the group images based on their similarity. Algorithm 1 shows the script for building the sub-graph within each group of image copies. The algorithm is designed for the case of one available reference image I0, but it can be run iteratively in case of multiple image references (see Fig. 4 for illustration).

Fig. 4
figure 4

Example of graph similarity measurement based on one-level and multi-level parent search: a represents the ground-truth, b and c represent two examples of constructed graphs. Red (continuous) and green (dashed) arrows are examples of one-level and multi-level search, respectively

The algorithm is composed of two main steps. In the first step (see lines 6 to 13), strong relations are detected in each group Gv, v = 1,...,K, which are used to establish the first oriented edges in the graph. In the second step (see lines 14 to 28), starting from the group root rv, vertices and edges are added incrementally to the graph by inferring the most likely transformations operated on the leaf images already added to the graph. To establish the next edge, a similarly measurement is used which combines local and global information of images. Global information consist in measuring distance between image histograms SH(IA,IB) as formulated using (2). Local information is measured by averaging the value of (4) in all image locations \(S_{L}(I_{A},I_{B}) = \frac {1}{|I_{A}|} {\sum }_{\mathbf {x} \in I_{A}} B(\mathbf {x})\), where |IA| represents the number of pixels in IA. Finally, the combined image similarity measure is calculated using the following formula:

$$\begin{array}{@{}rcl@{}} S(I_{A}, I_{B}) = \alpha S_{H}(I_{A},I_{B}) + (1-\alpha)S_{L}(I_{A},I_{B}), \end{array} $$
(5)

where the parameter α balances the contribution of local and global information (α = 0.5 is used as a default value). It is clear that S(IA,IB) ∈ [0,1], where 1 designate a perfect match between the two images IA and IB. The process of graph completion constitutes the second part of the algorithm (see lines 14 to 28). Let Gc be the set of images in group Gv that are added to the graph, and its complement \(\bar G_{c}\) defined by \(\bar G_{c} = G_{v} \setminus G_{c}\). For each iteration, we use (5) to select the next link to add to the graph between images \(I_{i_{1}} \in G_{c}\) and \(I_{i_{2}} \in \bar G_{c}\) such that:

$$\begin{array}{@{}rcl@{}} [I_{i_{2}},I_{i_{2}}]= \arg\max_{I_{j} \in G_{c}, I_{k} \in \bar G_{c}} S(I_{j}, I_{k}). \end{array} $$
(6)

Then, an edge \(I_{i_{1}} \rightarrow I_{i_{2}}\) is added to the subgraph of Gc and we remove \(I_{i_{2}}\) and all its descendants from \(\bar G_{c}\). The edge is annotated by first testing if an edition has occurred between \(I_{i_{1}}\) and \(I_{i_{2}}\) using (4), and use our trained SVM classifier to identify the type of edition. We also test the occurrence of weak relations {L,S,G,R,C} when NCC ∈ [δ1,δ2] as discussed in Section 3.1. Note that since line 10 of the algorithm builds strong edges between images, adding a parent image to the set Gc will incur adding all its descendants to Gc (see lines 22 to 25). This process ends for the group when \(\bar G_{c} = \emptyset \) and Gc is completely ordered, and we carry out the same procedure for all formed groups. Finally, all resulted subgraphs are annotated and added to the main graph \(\mathcal {V}\).

figure d

4 Experiments

To validate the proposed approach, experiments have been conducted on four datasets. The first two datasets contain 32 image copies each, which are generated manually by carrying out a series of transformations to the original Mona Lisa painting image. Figure 5a and b show the images of the datasets. The third dataset contains 31 copies of a van Gogh painting depicting a landscape scene, which are generated in the same way as the first two datasets. The fourth dataset has been collected from the Web (see Fig. 10) and contains 53 copies of a photograph by famous French artist Nadar. The public archive that owns the negative has published online several scans of the image. Other institutions that own paper prints (museums, archives, libraries and auction houses) also published digital copies online, and other copies were found on blogs, media Web sites and on Wikipedia using Google search. While the first three datasets have all information about transformations used for copies generation, the fourth dataset has gaps: some roots and descendants may be missing, which is normal case when copies are largely collected from the Web.

Fig. 5
figure 5

Copies of Mona Lisa image constituting our two first datasets a and b

To evaluate the accuracy of constructed graphs, we compared them to the ground truth (i.e., a reference graph) when available (datasets I and II) or to a graph created by an expert (dataset III) based on qualitative analysis of the visual content of the images and the context of their publication on the Web (e.g., authority of the Web sites, mention of source, presence of hyperlink, etc.) To measure similarity between graphs, we must take into account the following specificities. Since we are only interested in the temporal order of the copies, horizontal order of vertices within the same level is not important. However, the order of vertices in each vertical path from the root is important since it shows the lineage of each generated copy from its ancestors. Therefore, we cannot use directly existing methods for comparing tree graphs since most of them are dedicated to binary ordered trees [53, 55]. Instead, we propose a new measure as follows:

Given an image \(I_{i} \neq I_{0} \in \mathcal {I}\), let P(Ii) and \(\hat {P}(I_{i})\) (resp. A(Ii) and \(\hat {A}(I_{i})\)) be its parent vertices (resp. set of ancestor vertices) in the reference and the reconstructed graphs, respectively. In a one-level search, we consider that the parent of Ii is correctly identified if \(\hat {P}(I_{i}) = P(I_{i})\) for which we assign a score si = 1, and si = 0 otherwise. In a multi-level search, a parent of Ii is correctly identified \(\hat {P}(I_{i}) = P(I_{i})\), for which we assign a score si = 1, and partially identified if \(P(I_{i}) \in \hat {A}(I_{i})-\{I_{0}\}\), for which we assign a score si = 1/2. Finally, the total score s of \(\mathcal {I}\) is given by the following formula:

$$\begin{array}{@{}rcl@{}} s = {\sum}_{I_{i} \in \mathcal{I}}^{n} s_{i} / n \end{array} $$
(7)

A perfect match will obviously have a 100% score. To better understand our approach, Fig. 4 shows an illustrative example for two candidate graph reconstructions, (b) and (c), to the reference graph (a) containing 9 edges. Clearly the structure of the reference graph (a) is more similar to (c) than (b). However, the number of identified parents in (c) is lesser than (b). Indeed, using a one-level search, the score of graph (b) is \(5/9\simeq 0.44\), as the number of identified parents is 5 (\(I_{0} \rightarrow I_{1}\), \(I_{0} \rightarrow I_{6}\), \(I_{1} \rightarrow I_{2}\), \(I_{4} \rightarrow I_{9}\), and \(I_{5} \rightarrow I_{7}\)) and the score of graph (c) is \(1/9 \simeq 0.11\), as the number of identified parents is 1 (\(I_{0} \rightarrow I_{1}\)). Using a multi-level search, the number of partially identified parents is one for both graphs, making their total scores \(5.5/9\simeq 0.61\) and \(1.5/9\simeq 0.16\), respectively.

Table 1 shows values of our evaluation metric on each constructed graph from the four datasets. We can see that our algorithm has succeeded in identifying most of the transformations used to create the copies. The algorithm has achieved its best performance in the first three datasets which have been generated manually. The obtained graphs are shown in Figs. 678 and 9, respectively. For the fourth dataset, we run our algorithm on two versions of the dataset. In the first version, we do not provide any reference image (root) to the algorithm and graph reconstruction is preformed in a fully unsupervised fashion. In the second version, reference images identified by an expert were provided to the algorithm (see Fig. 12). For the first version, we can see that even when references were not provided, the algorithm has succeeded to identify the majority of image transformations. By including the reference images in a second version, the accuracy of the algorithm has been increased by almost 10% (see Fig. 10).

Table 1 Values of our evaluation metric on each reconstructed graph from the four datasets
Fig. 6
figure 6

Constructed copy graph of the second dataset in Fig. 5a

Fig. 7
figure 7

Constructed copy graph of the second dataset in Fig. 5b

Fig. 8
figure 8

Copies of a Van Gogh painting constituting the third dataset

Fig. 9
figure 9

Constructed copy graph of the fourth dataset of Fig. 8 without using reference images

Fig. 10
figure 10

Copies of Nadar photography constituting the third dataset

The above examples demonstrate the capability of our method for recovering the order of image copies and estimate the most likely transformations used for their generation. However, limitations exist for the algorithm for identifying image modifications where severe global/local changes are operated on the image. This can be seen in some images of our datasets where transformations such as editions and illumination changes have drastically changed the image content. For instance, in images 28 and 31 of Fig. 7, the illumination have been so altered that an edition has been detected by our algorithm. In the same vein, image 47 of Fig. 11 has been so altered that an edition has been detected, but the reconstructed lineages are not accurate (see Fig. 12).

Fig. 11
figure 11

Constructed copy graph of the third dataset of Fig. 10 without using reference images

Fig. 12
figure 12

Constructed copy graph of the third dataset of Fig. 10 using reference images

5 Conclusions

We have proposed a method for building evolution graphs for image copies. The method relies only on visual content of images where several features are used to infer types of transformations operated on images to produce the copies. Experiments results have shown that graph constructed by our algorithm are very close to manually-obtained ones. Future work will focus on enhancing the algorithm by enriching the set of possible transformations and exploring the use of metadata (e.g., URL location, file header, etc.) to improve the precision of the obtained graphs.

Future perspectives for extending this research include setting up a protocol to identify potential roots in a non supervised context. This protocol would take into account the following parameters: analysis of web pages URL and matching with categories of authority (museum, archive, library, auction house, government database), text analysis on the web page that could show citation of a source website (or hyperlink), presence of a watermark that signals a reference to a source or copyright owner. The development of new copyright protection and copy tracking services relying on the blockchain technologies will also provide new ways for identifying original images and roots through their inscription in a distributed directory.