Keywords

1 Introduction

Scientific analysis of artworks is a complex task that involves knowledge of different fields, such as materials science, chemistry or physics [12]. Computer science has an active role too, in particular image processing proved to be very helpful for researchers and restorers [20]. Notable examples involve the identification of cracks in paintings [6, 15], damage detection [4], image enhancement [13], authentication [16] and even artwork synthesis by deep learning [22].

In this paper we focus on a unique kind of artworks: historical violins. Their main difference respect to other more “traditional” pieces of art, such as painting or statues, is that historical musical instruments are both preserved in museums and played (even today), leading to a greater risk of damages and mechanical wear. Moreover, the multiple restorations occurred during centuries to maintain the instruments in use have created a very complex and stratified surface, hard to interpret. Several analytical techniques are commonly employed for analyzing these musical instruments such as stereomicroscopy, colorimetry, X-ray fluorescence (XRF) or Fourier transform infrared (FTIR) spectroscopy [3, 11, 18]. Among them, UV-induced fluoresce (UVF or UVIFL) photography is particularly effective for a preliminary examination, since it can highlight details of a violin surface not perceivable with visible light [2].

We previously analyzed UVIFL images of seven important violins made by Antonio Stradivari between 17th and 18th century, to point out the distribution of varnishes and materials on their entire surface [7]. Now we are interested in a more specific task, namely a multi-temporal study of the areas that are more subject to wear. Our goal is to provide researchers and restorers with a tool useful for the so-called preventive conservation, i.e. the constant monitoring of the state of conservation of an artwork to minimize the interventions on it [1].

This kind of monitoring on a historical violin presents various complexities. First of all, unlike searching areas with an established wear, that have precise chromatic characteristics [8], in this case we want to identify as fast as possible the beginning of a new alteration that can occur on any part of the surface both intact and ruined. Thus, we have no previous knowledge about the type of variation or a reference ground truth. Secondly, the acquisition is affected by various kinds of noise that cannot be completely avoided, such as wrong reflections due to the rounded morphology of the instrument and the high reflectance of the varnishes, or slightly variations in the positions of the object or of the lamps between different sessions (violins cannot be rigidly fixed to avoid damages). These systematic errors do not affect too much a global analysis of the entire surface, but they become critical when we focus only on specific areas to detect very small variations. The ideal solution would be a constant monitoring with various analytical and spectroscopic techniques to have an accurate mapping of the regions on interest. Unfortunately, this approach is very time consuming, limiting the number of instruments that can be checked at the same time. A complete verification with multiple techniques, to confirm the presence of an alteration, should be done only on the most likely altered areas.

Our idea is to exploit UVIFL images to identify meaningful regions of interest and then decide where and when apply further analyses. For this purpose, we need a segmentation very robust to environmental noise. Thus, comparing “stable” segmented images of the same instrument taken at distance of time only meaningful differences remain (minimization of the false positives). To reach this goal, we apply a genetic algorithm approach to evolve in this direction our previous method based on HSV histogram quantization [7].

The obtained method was tested on UVIFL images of two important violins, “Vesuvio” (1727) made by Antonio Stradivari and “Carlo IX” (c. 1566) made by Andrea Amati (previously analyzed during a six-month period using multiple non-invasive analytical techniques [10]) and images of a sample instrument artificially altered in laboratory. We created a publicly available dataset with the collected dataFootnote 1.

The paper is structured as follow: Sect. 2 describes the basic principles of UVIFL photography and our dataset; Sect. 3 summarizes the main characteristics of the previous segmentation method and then describes the proposed evolution; Sect. 4 shows the achieved results; finally, Sect. 5 draws the conclusions and proposes the next steps.

2 UVIFL Photography and Dataset Specification

UVIFL photography is a non-invasive analytical technique based on the properties of some materials which, when excited by ultraviolet lights, emit radiations with longer wavelengths than those of the exciting source. Basically, when these materials are illuminated with a light in the UV-A range (315–400 nm), like a Wood’s Lamp, they “produce” characteristic fluorescence colors in the visible light range (400–700 nm) [21]. Varnishes and substances used for restorations are generally sensible to UV-A light, thus UVIFL photography is very effective in Cultural Heritage studies to highlight meaningful features of the surface of an artwork [12]. In particular, in the case of historical musical instrument, UVIFL images are used to decide where to apply more precise but slower diagnostic techniques, like XRF or FTIR [19].

Fig. 1.
figure 1

Monitored areas [10] of back plates of Andrea Amati “Carlo IX” (c. 1566) and Antonio Stradivari “Vesuvio” (1727).

As said in the introduction, we used as test set images of a previous study [10]. This monitoring program investigated the back plates of the two violins (Fig. 1), focusing on the top (treble side, C1 and V1) and on the bottom (bass side, C2 and V2) areas. These two regions are more subject to alterations due to sweat and mechanical wear since always in direct contact with the musician when s/he plays the instrument. The two violins were selected based on their availability and frequency of use during the monitored period: one rarely played (“Carlo IX”) and the other frequently played (“Vesuvio”). The acquisition protocol followed the specification defined in our previous works [7, 9]. More precisely UVIFL photos were taken with a Nikon D4 full-frame digital camera with a 50 mm f/1.4 Nikkor objective, 30 s exposure time, aperture f/8, ISO 400. Two wood lamp tubes (Philips TL-D 36 W BBL IPP low-pressure Hg tubes, 40 W, emission peak \({\sim }\)365 nm) provided a uniform UV-A lighting. Images were acquired at regular intervals for six months for a total of three sessions. We took both pictures of the entire back plates and macros of the two areas of interest (three for each area in each session) to be able to detect small alterations on the surface. To increase the dataset, we also took pictures of a sample violin (SV01) artificially altered in laboratory. We focused on the bottom part of the back plate, starting from a region already ruined and we slowly increased the wear moving toward the intact varnish. To simulate the effect of sweat and mechanical wear produced by a musician, we scrubbed the surface with a cloth wet with alcohol. The process was repeated 20 times to reproduce a long-term evolution. We slightly moved violin and lamps between the various sessions to simulate random environmental variations.

3 Segmentation Algorithm

3.1 Original Implementation

Our previous classification approaches [7, 9] focused on highlighting the distribution of the main fluorescence colors of the entire surface of an instrument, with the goal to speed-up and make more efficient the standard examination of UVIFL imagery. Since the surface of each violin can be considered unique, due to the combination of different varnishes and different restorations, we cannot have a reference ground truth for every possible condition. For this reason, to group together similar fluorescence colors in a way that can be coherent for each instrument, we based our classification method on the physical principles of UV fluorescence. We designed a histogram quantization method that operates in HSV color space, where each channel has a different weight in function of its behavior tested experimentally. More precisely, Hue channel was divided in 12 bins, both Saturation and Value channels in 3 bins, with all ranges equally spaced, for a total of 108 possible classes. This configuration was chosen as a compromise between the need to discriminate different fluorescence colors and the need to group together similar regions. A pixel p belongs to a class C if its hue (\(H_p\)), saturation (\(S_p\)) and value (\(V_p\)) are inside the correspondent ranges of C (Eq. 1).

$$\begin{aligned} \left\{ H_p \in C_{H}, S_p \in C_{S}, V_p \in C_{V} \right\} \rightarrow \left\{ p\in C \right\} \end{aligned}$$
(1)

The algorithm proved to be robust to environmental changes such as small variations in the lamps’ angle of incidence or in the violin positions between different acquisition sessions. However, this property is valid only if we consider the entire surface of the violin. High resolution images of details, fundamental to detect small initial alterations, are inevitably more sensible to environmental errors.

3.2 Genetic Algorithm Implementation

Our main problem in finding an optimal solution is the limited amount of data: the acquisition of multi-temporal images of historical violins required several months to be significant and a continuous access for tests is granted only for few instruments. We have the artificially created sequence, but we want to verify if it is possible to train our method only with the “real” data. Since we have too few images to properly apply a deep-learning approach we chose genetic algorithms (GA), that can work efficiently even with a limited amount of data and are widely used in literature for image classification and segmentation [5, 14]. The main steps of a GA are summarized in Algorithm 1.

figure a

We extracted from our previous method the parameters that we want to evolve to produce a more robust segmentation, and those that we want to maintain because more strictly related to the properties of UVIFL photography:

  • Hue ranges are still equally spaced, but the number of bins (\(H_{range}\)) can change;

  • Saturation bins are still 3, but they are no more equally spaced (\(S_{low}\) and \(S_{high}\) are respectively the upper and lower thresholds between bins);

  • Value bins are still 3, but they are no more equally spaced (\(V_{low}\) and \(V_{high}\) are respectively the upper and lower thresholds between bins).

The five parameters (\(H_{range}\), \(S_{low}\), \(S_{high}\), \(V_{low}\) and \(V_{high}\)) became the genes of our GA. As usual, in the initial population (\(P_0\)) genes values were randomly chosen, but we set the following constraints to avoid degenerated cases (such as a large unique range): \(4 \le H_{ranges} \le 30\); \(S_{low} < S_{high}\); \(V_{low} < V_{high}\).

Choosing a good fitness function (f) is the most critical part in a GA. In our case we want to check if the current set of parameters produces a similar segmentation for all the images in the training data set (i.e. minimize the environmental noise). Thus, we performed a histogram comparison among the obtained segmentations using as index of similarity the alternative formulation of Pearson’s \(\chi ^2\) test (Eq. 2) put forward by Puzicha et al. [17]. The closer the value of the distance d is to zero, the greater the similarity between two histograms (\(H_1\) and \(H_2\)). As a consequence, for each individual in the population, lower the d, higher the f.

$$\begin{aligned} d(H_1,H_2) = 2 \sum _i \frac{\left( H_1(i)-H_2(i)\right) ^2}{H_1(i)+H_2(i)} \end{aligned}$$
(2)

During the selection phase, the fittest individuals (\(P_f\)) are identified (half of the population in this case) as parent for breading. Couple of chosen parents generate a new individual mixing their genes with crossover. Since in our case the genes are not all independent from each other, the crossover point/s cannot be chosen randomly as usually happens in GA. We fixed two crossover points one after \(H_{ranges}\) and one after \(S_{high}\). In this way the new offspring always receive from the parents reasonable couples of genes.

Finally, mutation is applied to the offspring (\(O_n\)) to guarantee the diversity in the population. Since we have only 5 genes we changed at most one (randomly chosen) in each new individual applying a random variation in the range [−3, +3]. The new generation (\(P_n\)) is then created merging the fittest individuals of the previous generation (we applied elitism, thus only the best 20% passed) with the generated offspring.

For training we used the multi-temporal images of “Carlo IX”. We know from the previous monitoring [10] that this violin had no alterations during the six-month period, thus variations in UVIFL images are only due to changes in environmental conditions that we want to manage. Training images were divided in three groups accordingly with the regions framed: the entire back plate, the top left (C1) and the bottom right (C2) areas. We excluded from training one image for each session of area C2 to be used later during tests. We run the procedure separately on the three subsets for 10 generations with a population large 20. Since we had no reference ground truth towards which to converge, the stop conditions were (i) reaching the maximum number of generations or (ii) the absence of changes in population for more than two iterations. At the end of the process we found three different solutions that were slightly different among each other, optimal only for the specific case. This result was expected because the input data were few. Thus, we took as a valid global solution the “intersection” among them, namely not the solution with the best fitness but a valid solution that appeared in all three cases with minimum difference in parameters:

  • \(H_{ranges} = 14; S_{low} = 45; S_{high} = 55; V_{low} = 16; V_{high} = 91\).

It can be noticed that value \(H_{ranges}\) is closed to the original one (12), instead thresholds for Saturation and Value changed much more and with an opposite behavior: Saturation works better with a narrow central bin while the contrary for Value.

4 Results

We compared the previous and current segmentation on all the remaining multi-temporal image sequences.

Fig. 2.
figure 2

Segmentation of areas with no alteration (C2 on the left and V1 on the right): original images with correspondent dataset IDs (first row); previous segmentation [7] (second row); current one (third row).

Firstly, we analyzed areas C2 and V1 where no meaningful alteration occurred during the sixth-month monitoring period [10] and thus eventual variations in the images are only due to environmental noise. In both cases the new approach outperforms the previous one producing a nearly identical segmentation for all three sessions (Fig. 2). This is particularly evident by comparing previous and current outcome in session 3 for area V1. The new method was also able to partially handle a large wrong reflection in area C2 session 3. Generally, images with so large reflections are discarded during the acquisition step, since the noise is evident even by naked eye, we used it in this case only for test purposes.

As similarity metric we chose the \(\chi ^2\) test (Eq. 2) comparing the segmentation for session 1 (our reference initial state) with those for session 2 and 3 respectively (Table 1). As expected \(\chi ^2\) values slow down with the new approach and remain stable between sessions, while with the old one change significantly. The slight increase in session 3 for area C2 is only due to the presence of the large reflection.

Table 1. Similarity measure (\(\chi ^2\)) among sessions for “Carlo IX” and “Vesuvio”
Fig. 3.
figure 3

Segmentation of area V2: sample images of the three sessions (first row), red circles highlight regions with slight alterations between sessions; previous segmentation [7] (second row); current one (third row). (Color figure online)

More interesting is the case of area V2 (Fig. 3) that suffered slight alterations between the three sessions (focused in the region inside the red rectangle). Comparing the two segmentations we can notice that alterations are more visible with the previous method (second row) respect to the new one (third row). However, wrong and right variations have the same “weight” with the old method, while with the new one we achieved a more uniform segmentation less prone to errors. The quality of the result can be appreciated performing the weighted difference (Eq. 3) between the first and the third session in the two cases.

$$\begin{aligned} diff(s1,s3) = \left\{ \begin{matrix} 255 &{}\ if \left| s1-s3 \right| \ge th \\ 0 &{}\ if \left| s1-s3 \right| < th \\ \end{matrix}\right. \end{aligned}$$
(3)
Fig. 4.
figure 4

Weighted difference between segmentations of Sessions 1 and 3 for V2: previous segmentation [7] (left); current one (right). True Positives (real alterations) in green and False Positives (noise) in red. (Color figure online)

Fig. 5.
figure 5

Segmentation of SV01: sample images of the various sessions (first row), not altered areas highlighted in green, altered areas in red; previous segmentation [7] (second row); current one (third row). (Color figure online)

We considered only the greatest differences (\(th = 240\)) for a better visualization and for excluding the effect of small misalignments among pictures. True Positives (TP) are highlighted in green and False Positives (FP) in red (Fig. 4). The outcomes clearly show that old approach is very sensitive producing a large number of FP randomly diffused on the surface, while the new one is more focus with a significant reduction in the number of FP. This is coherent with our design principles. We want a segmentation able to highlight reasonable regions of interest on which perform further analyses, thus, we are not interested in a perfect detection of all TP pixels (it is enough to roughly highlight the correct areas), but it is crucial to minimize the FP to avoid unnecessary examinations. It also worth notice that the value of \(\chi ^2\) is low also for V2 (Table 1) since the altered area is very small compared with the remaining of the surface that did not change among sessions.

Finally, we compared the two methods with the artificially created sequence SV01 (Fig. 5) to test the performances on a long time period. In this case we had a full control on the setup: we gradually worn out only the bottom part of the violin (first row, red rectangle), maintaining unaltered the upper region (green rectangle). Violin and lamps were slightly moved among sessions thus there are various kinds of noise in all images. Also in this case the current segmentation prove to be more robust than the previous one: variations are present only in the red region while the green one is stable among sessions (third row). On the contrary, the presence of environmental noise in the green region have a high impact in the older segmentation (second row). In this case the similarity check is not meaningful, since images significantly change among sessions.

5 Conclusions

In this paper we presented a robust segmentation method useful to perform comparison among multi-temporal UVIFL images of historical violins. Tests performed showed promising results: the proposed approach was able to efficiently handle environmental noise without losing meaningful alterations. The multi-temporal UVIFL images used for the experiments were collected in a public available dataset, the first of this kind to the best of the authors’ knowledge.

As next step, we plan to increase our dataset with new sample images created in laboratory to simulate, as faithful as possible, various alteration conditions. A larger dataset will allow to better assess and refine the proposed segmentation method. We are also considering the integration with other image processing techniques to improve the early detection of alterations.