1 Introduction

During the last decades, Raman spectroscopy has grown to a frequently used technique in archaeometry and art analysis. This evolution was facilitated by instrumental improvements, such as the coupling of the Raman spectrometers with a confocal microscope, the introduction of the fibre optics and the miniaturisation of the systems towards the development of mobile instrumentation for in situ and direct analysis. Along with the broadening of the applications, for comparative reasons, the need of high-quality spectra of known references became clear. Often the laboratories developed their own collection of reference Raman spectra, while some groups published dedicated spectral collections. The first Raman spectral collections dedicated to cultural heritage research focussed mainly on inorganic pigments [1,2,3,4,5] and also on natural binding media and varnishes [6,7,8]. More extended databases are available for, for example, minerals [9] or biomolecules [10]. Moreover, dedicated papers focus on discriminating between Raman spectra of a relatively small group of related materials, such as natural silicate glasses [11], Mexican artists’ materials, [12, 13], green [14, 15] or black minerals [16]. Concerning the analysis of synthetic organic pigments (SOPs), several groups have worked on this topic and published their results [17,18,19,20,21]. Besides the dedicated research articles on SOPs, a spectral database is openly available [22], thus enhancing the direct comparison and solid identification of such components. This work contributes not only towards the analysis of modern artworks [23], but it has also implications in forensics research [24,25,26]. In general, the availability of a large collection of reference Raman spectra is a positive aspect for the cultural heritage research community. However, published spectra are not always readily available in an accessible digital format and measurement conditions can influence the appearance of the Raman spectrum (e.g. selected laser wavelength, calibration, spectrometer response, etc.). Moreover, manual comparison of a spectrum against a (large) database of reference spectra is time-consuming and therefore several approaches for automated searching algorithms have been proposed [25, 26]. Often, these spectral searching algorithms have proven to be successful to perform identification in specific groups materials. Different (multivariate) chemometric approaches have been proposed, of which principal components analysis (PCA) seems a frequently used method [27,28,29,30,31,32,33]. Other approaches include linear discriminant analysis [34, 35] or more advanced approaches such as fuzzy logic [36, 37].

Unfortunately, multivariate approaches typically have some disadvantages. In general, they use the entire spectra to compare against the spectra in a reference database. As a consequence, variations in the spectral background (e.g. fluorescence or ambient light) may interfere. Moreover, it is of the utmost importance that the reference spectra as well as the spectrum of the unknown are well calibrated [38]. Moreover, intensity fluctuations or differences (between reference spectra and unknown) in spectral resolution may hamper the identification. Especially, if the reference database is recorded with a different excitation laser or more generally speaking, when using different optics or experimental conditions as the spectrum of the unknown, identification is not always straightforward. Chemometric techniques can also be used for data reduction. PCA, for instance, projects all spectra in a multi-dimensional space and recalculates the values to obtain a limited number of orthogonal values. In this case, each time a new reference spectrum is added to the database, all the dataset has to be re-processed for the new data reduction.

To overcome these issues, we propose a simple spectral searching algorithm that is based on the extraction of the Raman band positions and compare the retrieved Raman band positions against those of the reference spectra. The aim of the algorithm is to let the user identify easily spectra from a large database that correspond to the spectrum of the unknown. This approach is evaluated based on the analysis of mock-up samples and is as well applied during the in situ analysis of street art.

2 Experimental

Raman spectra were recorded by using two different portable Raman spectrometers: a Bravo handheld spectrometer (Bruker, Ettlingen, Germany) and a 1064 nm i-Raman EX instrument (BWTek, Newark, USA). The first instrument is a small, battery-operated, fully automated Raman spectrometer that, when started, analyses the sample by irradiating it subsequently with two lasers of different wavelengths (785 and 852 nm), allowing to cover a relatively broad spectral range (between 300 and 3200 cm−1). Moreover, during the automated protocol, the spectrometer records a series of spectra at slightly different laser wavelengths and performs an automated sequentially shifted wavelength baseline correction, avoiding interferences caused by fluorescence or other features irrelevant to the actual Raman signal. During analysis, the instrument is typically positioned in contact with the paint layer and held by hand during the analysis (typically less than a minute). After analysis, by using the manufacturer’s OPUS software, the spectra are converted to Galactic/Thermo.SPC format. The software automatically determines the settings like measurement time and number of accumulations, while the laser power on the sample remains constant (fixed by the manufacturer). Despite the limited possibilities to interfere with the instrument and modify the settings, the spectrometer is easy to operate and in general it yields good quality spectra.

To evaluate the developed simple spectral searching algorithm, spectra of mock-up samples were also recorded with an instrument with a longer laser wavelength: the i-Raman EX spectrometer (BWTek, Newark, USA) is a dispersive spectrometer that uses a Nd:YAG laser (1064 nm) as excitation wavelength. It covers a spectral range of 100–2500 cm−1 and has a spectral resolution of ca. 10 cm−1. This instrument allows us to adjust laser power, measurement time and number of accumulations. Typically, spectra were recorded by using 10 accumulations of 10 s. The fibre optics probe head was positioned in front of the samples by using a manual stage, and the distance to the sample was optimised by performing short measurements.

The in-house written simple spectral searching algorithm was developed using MATLAB (The MathWorks Inc., Natick, MA, USA) version 9.5.0.1033004 (R2018b) and was run under Microsoft Windows 10. Band detection is based on the findpeaks function from the Signal Processing toolbox (Version 8.1). This algorithm is available as electronic supplementary material.

To identify the spectra of the unknown samples, the developed algorithm was used to compare the spectra with the spectra of modern synthetic pigments from the well-known collection of reference Raman spectra from KIK/IRPA (Brussels, Belgium). This collection consists of 325.SPC files that are available in two versions: original spectra and baseline-corrected spectra. For our experiments we worked with the original spectra that were recorded with a Renishaw Raman spectrometer, equipped with a 785 nm excitation laser. The spectra are available under the Creative Commons Attribution-NonCommercial-NoDerivatives licence [17].

To test the developed algorithm on commercial paint samples, simple mock-up samples (Fig. 1a) were made by applying Rembrandt Talens acrylic paint (Apeldoorn, The Netherlands), with the help of clean brushes. The main pigments that are used for the production of this commercial paint are mentioned on the paint tubes and are used for comparison (Table 1, first column).

Fig. 1
figure 1

a Mock-up samples of 23 pure paint samples. The overlay numbers refer to the paint numbers as assigned by the manufacturer. b Graffiti Street (Werregarenstraat) in Ghent (Belgium), where in situ spectra are recorded from street art using the Bruker Bravo spectrometer

Table 1 Summary of the results of the simple searching algorithm applied on the Raman spectra as recorded from the mock-up samples

3 Results and discussion

3.1 Developing a simple spectral searching algorithm

In this research paper a new simple spectral searching algorithm is implemented that is based on the comparison of Raman band positions. Figure 2 provides a schematic overview of the dataflow in the proposed algorithm. The original spectrum of the unknown is loaded into the MATLAB workspace, and an automated polynomial baseline correction is performed [39]. The user can choose to truncate manually the spectrum to a specific spectral range and eventually the sensitivity (related to the relative intensity or signal-to-noise ratio of the spectrum) of the spectral band detection can be modified, and the Raman band positions of the N most intense Raman bands are stored in the so-called peaktable.

Fig. 2
figure 2

Schematic overview of the dataflow in the proposed simple spectral searching algorithm

A library can be generated from a collection of spectra: all the Raman spectra from a file folder are sequentially loaded into the workspace and arranged into one library. The software is able to read galactic.SPC files or.txt files from the RUFF project [9]. Moreover, a function is implemented that allows the user to add spectral information from literature to the spectral library. This information can for instance be a list of Raman band positions as reported in a research paper. The resulting spectrum consists of a flat line, with sharp, full intensity ‘Raman bands’ at the specified band positions. After the acquisition of the spectra, the library is transformed to a peaktable library. First, all reference spectra in the library are truncated to the same spectral range as was selected for the peaktable of the unknown (only the Raman bands of references and unknown within a common range are considered), and all reference spectra are automatically baseline corrected before Raman bands are detected. As a consequence, each time an unknown spectrum is investigated, the peaktable library needs to be generated. The findpeaks function is run with the highest sensitivity (minimum prominence [44] is set to 0) and subsequently the algorithm selects the Nmax N × 1.8 (rounded to the nearest integer) most intense Raman bands. The factor of 1.8 is an arbitrary factor that usually allows to keep computational time and memory use manageable, while still permitting that the algorithm is not limited to the minimum of N most intense bands. Thus, it is possible to cover, to some extent, differences in relative band intensities between the references and the unknowns: the algorithm can select the N band positions (out of Nmax) that best correspond to the N bands of the unknown. For each spectrum in the library, the algorithm generates an (N-by-C) array that contains all possible unique combinations of N band positions out of a pool of Nmax band positions. As a consequence C equals \(C = \left( {\begin{array}{*{20}l} {N_{\max } } \hfill \\ N \hfill \\ \end{array} } \right) = \frac{{N_{\max } !}}{{N!(N_{\max } - N)!}}\). Thus, for each spectrum in the reference library, the algorithm selected a series of N band positions out of the Nmax most intense Raman bands. This way, the algorithm is not limited to the N most intense bands for each reference, but has some built-in flexibility to deal with slightly different band intensities that might be caused by different excitation lasers, different detector sensitivities, crystal orientation, bad signal-to-noise ratios or even impurities. It should be remarked that the algorithm requires a minimal separation of 30 cm−1 between two detected bands. Thus, a difference in spectral resolution between the reference spectra and the spectrum of the unknown is taken into account: confusion between the presence of two nearby bands and a shoulder is avoided.

In the final stage, the software calculates the Euclidean distance of band positions for each reference spectrum between the peaktable of the unknown and each of the C selected combinations of N Raman bands. For each spectrum the best possible combination is considered and the Euclidean distance d, as calculated in N-dimensional space, is reported. If this should be considered useful, other (dis-)similarity measures can be implemented easily. Next to making all numeric values available in the workspace, the algorithm provides a graphical output as well (Fig. 3). It reports a graph with the distance d plotted for all reference spectra. A further plot is generated with the spectrum of the unknown overlaid on top of the reference. Finally, the selected Raman bands of both unknown and reference are marked in the report. It was not possible to determine an absolute limit value to mark a trustworthy identification as this is highly dependent on both the composition of the reference database and the Raman spectrum of the unknown on hand. In different situations a broad range of distances for correct identifications can be found. Therefore, a tool was provided to facilitate easy evaluation of the results: the user can scroll through the reference spectra, updating the overlay plots accordingly. Apart from the speed of the computer, the speed of the identification algorithm is dependent on the number of bands considered and the size of the reference database. Typically a result was obtained in few minutes time. The MATLAB code for the functions of this algorithm is available as supplementary material (S1).

Fig. 3
figure 3

Example of the output of the simple spectral searching algorithm. The unknown spectrum of a mock-up sample from Rembrandt Talens was recorded with a BWTek i-Raman EX(1064 nm excitation) spectrometer. The correct pigment (PY74) was identified from the KIK/IRPA spectral collection [17]

3.2 Evaluating the simple spectral searching algorithm based on the Raman analysis of mock-up samples

The spectral searching algorithm is evaluated based on a series of Raman spectra that were recorded from 23 mock-up samples of pure paint samples, applied on prepared canvas stretched on cardboard (Fig. 1a). These samples comprise cases where the pigment is present in the reference collection and cases where the pigments are not present, as well as mixtures of pigments. Spectra were recorded by using the Bruker Bravo instrument and the i-Raman EX spectrometer and compared against the 325 spectra in the KIK/IRPA database [17]. As we aim to identify spectra that are recorded under entirely different circumstances (e.g. different excitation lasers, spectral preprocessing, different spectrometer response curves) than the reference spectra, our algorithm does not bring the (relative) Raman intensities into account, only the band positions. In Fig. 3, the spectrum plotted in blue is the spectrum of the mock-up sample and the red spectrum is the corresponding reference spectrum of the yellow pigment PY74. Due to the use of different instruments, calibration and acquisition parameters than the database (spectral resolution and excitation wavelength), the spectra present differences.

In general, upon the analysis of the mock-up samples, the simple spectral searching algorithm yielded interesting results that are summarised in Table 1. We have truncated all recorded spectra between 300 and 1800 cm−1, to avoid interference, caused by TiO2, which is commonly used as white pigment in contemporary art and that may dominate the low-wavenumber spectral range. TiO2 is not included in the spectral database of contemporary organic pigments.

Table 1 gives an overview of the results from the mock-up samples. For each spectrum (S2) the number of Raman bands N that are incorporated in these calculations are mentioned, as well as the identification (the name of the reference with the shortest Euclidean distance d). We also mentioned the rank (in a list of 325 reference spectra ordered from the shortest Euclidean distance to the highest) of the correct identification: the reference spectrum that corresponds to the pigment composition as mentioned by the manufacturer. Three groups of mock-up samples were used for the evaluation of the approach: (1) paint samples where the pigment is present in the reference database, (2) paint samples where the pigment is not present in the reference database, and (3) paint samples that consist of a mixture of pigments. All spectra were recorded with two different spectrometers. Spectral matching was performed by comparing the Euclidean distances d. It should be noted, however, that these are influenced by several parameters, including spectral resolution and calibration of the spectrometers. In general, it seems that the Euclidean distances are slightly higher for the i-Raman EX spectrometer, compared to the Bravo instrument. When the searching algorithm correctly classifies the paint sample, the Euclidean distance is typically less than 12 (for Bravo), except for sample 318, while for the i-Raman EX instrument these values reach a value up to 20. When judging the group of misclassified spectra, the distance values are typically higher than 30 for Bravo and even higher (37) for the i-Raman EX spectrometer. In general, the number of misclassifications is slightly increased with the i-Raman EX instrument, compared to the Bravo spectrometer. From the misclassified spectra no. 272, 267, 366 and 318, it can be seen that these spectra have significant contributions of CaCO3, which is either used as a filler in the paint, or as a whitener that was used on the support where the paint samples were applied. This is reflected in intense Raman bands at 1086 cm−1 and often also observed at 712 cm−1. As these bands are relatively intense and as they are not present in the reference samples, their presence can hamper the correct assignment. Therefore, we have deleted these bands in the corresponding peaktables, and the results of this subtraction are reported in the table (marked with a *). These omissions significantly improve the results of the simple algorithm, but also illustrate that this approach can be sensitive to the presence of other Raman scatterers. When examining the results of samples that contain several pigments, we remark that sometimes it was possible to identify the main component (e.g. the analysis of samples no. 564 and 619 with the Bravo instrument). However, the algorithm is not designed to identify all components present in a mixture. Manual examination of the overlaying spectra reveals in these cases some Raman bands of the unknown spectrum that are not present in the reference material. It is worth to note that the identification strongly depends on the spectral quality of the recorded Raman spectrum. Indeed, if weak bands and a high background are present, the peak-picking algorithm may erroneously consider the background noise as Raman bands, resulting in wrong assignments.

It should also be noted that the Raman spectra of synthetic organic pigments in general have many, relatively sharp Raman bands. Several classes can be distinguished, but for pigments which present similar molecular structure, the Raman spectra may be highly similar [18, 28]. Thus, to support the further evaluation of the results, the algorithm provides a graphical representation, where the spectra of the unknown component and of the standard are shown as overlay (Fig. 3). However, in the specific case of differentiating between the polymorphs of the same pigment (e.g. copper phthalocyanine pigment (PB15)) a more advanced approach is needed to differentiate between these highly similar spectra [33, 34].

The algorithm automatically selects a number of intense bands in the spectrum of the unknown (based on their intensity and the amount of noise in the spectrum), but the sensitivity can be manually modified. Then, the algorithm finds in the reference library many possible combinations of detected bands, and selects the best fit. The need for this is illustrated in Fig. 4: in the right column the algorithm selects the best possible combination, whereas in the left column only the most intense bands are selected. However, if different relative Raman band intensities are observed between the reference and the spectrum of the unknown pigment, the algorithm may not select the most appropriate bands. These differences in intensities can be caused by different reasons, such as orientation, different excitation lasers, background correction, detector sensitivity, etc., or even thermal decomposition (that might affect also the Raman band positions). These drawbacks may be overcome by selecting the best fit from a large set of combinations of band positions. On the other hand, the approach may cause a serious increase in calculation time and memory problems when a large number of Raman bands is involved.

Fig. 4
figure 4

Results of the analysis of a yellow mock-up sample, with different settings of the peak-picking sensitivity (a,d: N = 6 bands; b,e: N = 9 bands; c,f: N = 14 bands). For the results a-c only the N most intense Raman bands were selected, while for the results d-f all possible combinations of N bands selected out of the Nmax most intense bands were selected. Correspondences between the bands are marked with a dashed line. All bands are also plotted in the corresponding scatter plot. The diagonal line is only presented for clarity

3.3 Simple spectral searching algorithm applied on in situ analysis of street art

After having tested the approach on mock-up samples, the approach is applied during the in situ analysis of street art. In the pedestrian zone of downtown Ghent, there is a small alley (Werregarenstraat, Ghent, Belgium), dedicated to graffiti, and commonly called ‘Graffiti Street’. In this alley, street artists are allowed to express themselves on the walls of the adjacent buildings. These walls are covered with graffiti, and are regularly overpainted by other artists. As a consequence, the appearance of the street changes very frequently. We recorded Raman spectra of different scenes in this street (Fig. 1b), to evaluate the simple spectral searching algorithm. In this study, the spray-painted surfaces were analysed by performing in situ Raman spectroscopy—a non-destructive approach that is frequently used in conservation science, next to methods that require a sample, such as benchtop Raman spectroscopy, Fourier-transform infrared spectroscopy (FTIR) or scanning electron microscopy (SEM–EDX). The current study does not aim at fully identifying all components of the spray-paint, such as the binding medium that can be investigated using gas chromatography, but only wants to test usefulness of developed the searching algorithm for the identification of synthetic organic pigments.

When using spectra collected from a real case study, as those from street art, it is of utmost importance to be very critical towards the outcome of the simple searching algorithm. Typically, the spectra as recorded in Graffiti Street are more noisy, as compared to the spectra of the mock-ups. Moreover, sometimes mixtures are present, or spectra may be overwhelmed by contributions of inorganic components, such as CaCO3 or TiO2. Usually, the contribution of the binding media is much weaker than the Raman signal of the pigments. Although serious fluorescence can be present, the Bravo instrument’s shifted wavelength baseline correction algorithm should correct for this [40,41,42]. To evaluate the correctness of the identification, the user can evaluate the calculated Euclidean distance d. The smaller the Euclidean distance is, the better the band positions of the unknown and the reference spectrum correspond. Moreover, the spectrum of the unknown can be (manually) compared with the reference spectra and be examined for missing Raman bands, or prominent changes in relative intensities, as this also can indicate the presence of pigment mixtures.

Figure 5 represents some spectra in Graffiti Street as recorded with the Bruker Bravo instrument. As can be seen from spectra 5a, c, it is possible to achieve a good identification, which is also supported by the short Euclidean distances d. On the other hand, in the case of the analysis of the red paint from a graffiti painting (Fig. 5d) the identification as PR2 is wrong. The band positions in the region between 1800 and 1000 cm−1 at first sight seem to correspond, and the differences in relative intensities could eventually be assigned to a different spectral resolution. However, when looking at the low wavenumber range, it is clear that this spectrum does not correspond. When more bands are included (Fig. 5e), the assignment correctly falls to pigment PR112. It is remarkable that both pigments (PR2 and PR112) belong to the naphthol AS pigments and differ only by an addition of an extra Cl- and CH3- group [43]. In the examples in Figs. 5b, f, the pigments seem to be assigned correctly (i.e. assigned to the same pigments as would have been done by manual comparison against the library). However, in the low wavenumber range some intense and relatively broad bands are observed that can be assigned to the presence of titanium white (TiO2), which is the most common white pigment that is now on the market. Despite this interference, it was possible to assign the pigments, although the Euclidean distance is significantly higher.

Fig. 5
figure 5

Examples of spectra as recorded with the Bravo instrument, from Graffiti Street (blue), along with the assigned reference spectrum (red). The triangular markers in red and blue indicate the included band positions of the unknown and reference spectrum, respectively

4 Conclusions

In this paper, we propose a novel, simple spectral searching algorithm that was used successfully for the analysis of pigments in contemporary street art. The algorithm is based on the extraction of the most prominent Raman bands from the spectrum of the unknown, and subsequent comparison against the extracted Raman bands from a spectral library of synthetic organic pigments. A simple Euclidean distance is used to determine the spectrum with the highest similarity. As it is based on the comparison of the band positions and as it does not take the intensities into account, the main advantage of this algorithm is that it experiences minimal interferences from background radiation, as well as from spectra that are recorded with different wavelengths or at a different spectral resolution, compared to the reference database. The algorithm is extensively tested on mock-up samples and on Raman spectra of street art that were recorded in situ by using portable spectrometers, and successes as well as failures are described.