Introduction

Chemical analysis of inks and paints is mostly restricted to pigment identification [1]. In the field of cultural heritage, it is important to cope with problems concerning the authenticity, restoration, conservation, and dating of historical manuscripts, documents, or ceramics [2]. A similar analytical approach is used in forensic sciences to reveal document fraud. Here, the focus is on the detection and confirmation of alterations to documents with significant financial value such as wills and contracts. Comparison of the inks is used either to question the document authenticity or the time at which the various sections of the document were written [3, 4]. Consequently, there is a substantial demand for analytical methods that will allow examination and a reliable discrimination of inks.

The differentiation and correct classification of inks is complicated, as their chemical composition is usually unknown. In general, apart from pigments and dyes, inks contain vehicle (flowing and drying characteristics), solvents, additives (driers and plasticizers), and contaminations. Altogether, these components form a chemical fingerprint of the ink providing a basis for discrimination by chemical analysis [5, 6]. Already 60 years ago, the first ink analyses were performed with the help of thin-layer chromatography [7, 8], which is still commonly applied due to the method’s simplicity and cost-effectiveness. In general, most techniques used for ink analysis are based on chromatography (e.g., high-performance liquid chromatography (HPLC) and gas chromatography). Despite the time-consuming and often cost-intensive sample preparation including dissolution and enrichment, those methods are popular because they offer high discrimination performance [9].

Spectroscopic techniques allow for a fast and direct measurement on the paper with no or minor sample preparation. Furthermore, no or only minimal sample destruction occurs, thus permitting multiple measurements, even on tiny sample pieces. Therefore, in combination with advanced multivariate statistical tools, minimally invasive spectroscopic techniques such as Raman spectroscopy or LIBS provide the optimal investigation of small fragments [1015].

The combination of LIBS and Raman spectroscopy gives comprehensive insight into a sample composition as the chemical information obtained with both the techniques is orthogonal: Raman spectroscopy exposes molecular structures [16] while LIBS reveals the elemental composition [1719]. Although both techniques have been used independently for classification of inks, this is the first time when benefits originating from both techniques are combined for improved ink discrimination. Zięba-Palus et al. presented a combined X-ray fluorescence/Raman spectroscopy instrument for the analysis of paint chips. The benefit of combined elemental and molecular analysis was demonstrated. However, no combined data evaluation was executed [20].

The high degree of similarity in the composition of inks demands the use of data mining tools to reveal even minimal differences in chemical compositions. Furthermore, the increase in amount of data due to the use of high-resolution Echelle gratings along with the increase in a number of pixels in charge coupled devices require a significant data reduction in order to shorten the data evaluation time. Thus, chemometric methods, such as principal component analysis and others are of rising importance for spectra processing. Unfortunately, they are still rarely used in the field of ink discrimination [4, 21, 22].

Chemometrics and multivariate data analysis, in particular, have been successfully used only in a limited number of cases to aid the forensic examination of inks based on inductively coupled plasma mass spectrometry, HPLC, UV-, Vis-, or micro-ATR-spectroscopy of extracted inks [4, 2224]. In all these cases, classification rules were established based on PCA. Here, next to PCA, we demonstrate the potential of alternative chemometric tools such as soft independent modelling of class analogy (SIMCA), partial least-squares discriminant analysis (PLS-DA) [25], and support vector machine (SVM), for the classification of blue and black inks without using any prior background information on the samples [26]. The aim of this work is to investigate the discrimination power of a multi-spectroscopic approach in combination with advanced chemometric data evaluation techniques for the classification of ink samples of forensic interest. We present a separate chemometric evaluation of either LIBS or Raman data and compare the results with the combined evaluation of fused LIBS and Raman data.

Materials and methods

Samples

The sample set consisted of ten blue and black ink samples provided by the state criminal investigation department (LKA, Landeskriminalamt, Berlin, Germany). During this study, the ink samples were labeled in alphabetical order from “A” to “K”, excluding “I”, as the LKA kept the original brand names. The inks were deposited on commercially available standard white paper. The influence of the paper on spectra was investigated using the blank paper and a set of different paper samples coated with the same ink.

Raman spectroscopy

Raman spectra were obtained with a LabRAM HR800 (Jobin Yvon, Bensheim, Germany) coupled to a microscope (BX41, Olympus, Hamburg, Germany) equipped with a ×100 objective. Rayleigh scattered light was rejected with notch filters. The excitation wavelengths (488, 633, 785 nm) were provided by an Argon ion, HeNe (Melles Griot, Aalsbergen, The Netherlands), and a diode laser (Toptica, Graefelfing, Germany). All spectra were recorded under similar conditions with respect to the exposure time (10 × 10 s) and laser power which was adjusted to 1 mW on the sample corresponding to an irradiance of approximately 1 × 105 W/cm2 (spot size, 1 μm). The average spectral resolution was 3 cm−1 (monitored range, 300–1,800 cm−1; excitation wavelength, 488 nm). Two approaches were compared for the calibration of the Raman shift axes: (a) using the recorded spectrum of 4-acetamidophenol as a spectroscopic standard and (b) monitoring the emission lines of a neon lamp shining onto the samples during measurements. The better results were accomplished with the calibration with neon emission lines as the wavelength shifts could be detected online during sample measurement. Thus, a wavelength shift correction by software was avoided because such the shift was precluded between the calibration and sample measurements.

LIBS

The setup employed in the present study has been described in detail previously [27]. In brief, a frequency-doubled pulsed Nd:YAG laser (Surelite II, Continuum, Germany) is focused on the sample, ablating some hundreds of nanograms of matter. Plasma emission is collected with an Echelle spectrometer (Aryelle Butterfly, LTB Lasertechnik Berlin, Germany) in the range of 290–930 nm. A mechanical chopper cuts off the initial plasma continuum. To minimize paper and maximize ink ablation, the samples were translated during the measurement to provide an unspoiled sample spot for every laser pulse. For each single measurement, the emission of 50 plasmas was integrated on chip. A low irradiance of 50 GW/cm2 was used to ensure shallow craters and thus, to reduce the amount of ablated paper.

Data treatment

The spectra obtained from Raman spectroscopy or LIBS were treated by background correction before being imported into The Unscrambler (vers. 9.8 and 10.1) for statistical analysis. All data were first treated by the unit-vector normalization. Spectral data from LIBS and Raman were evaluated separately. In addition, LIBS and Raman spectra were randomly fused (one randomly picked Raman spectrum was attached to one randomly picked LIBS spectrum) in order to keep the variance of both data sets. For a combined uniform X-axis, LIBS spectra starting with 1,797 cm−1 and ending at 2,432 cm−1 were attached following the Raman spectra which comprised the spectral region of 300–1,796 cm−1. For the evaluation of those three data matrices (Raman, LIBS, LIBS-Raman), the following statistical tests were performed: PCA, SIMCA, PLS-DA, and SVM. For SVM, the classification SVM type 2 was used as this type minimizes the error function. The Nu value (lower bound on correct classified support vectors and an upper bound on misclassified samples) was set to 0.5, and the radial basis function kernel was applied.

The reliability of the classification rules was validated through a random cross-validation procedure. For PCA, cross-validation was performed on all 100 objects by systematically keeping out all ten objects with the same ink. For PLS-DA, full cross-validation on all objects was performed; for SIMCA, the models are based on only the ten objects of the same ink, and full cross-validation was performed, and for SVM, systematic cross-validation was performed.

Results and discussion

Spectroscopy

Raman spectra arise from the inelastic light scattering depending on the vibrational modes of molecular bonds. Raman spectroscopy, however, is often hampered by occurring fluorescence which may exceed a Raman scattering signal by several orders of magnitude [16]. Although Fourier-transform Raman with an excitation wavelength in the near-infrared is considered to be ideal for most samples [28], the test samples of black and blue ink were found to absorb the 1,064 nm radiation. This behavior leads even at low laser powers, to a sample heating before Raman scattering could be detected. Therefore, several excitation wavelengths in the visible region were tested (488, 633, 785 nm). The 488 nm-excitation proved to be the optimal choice as the accompanying fluorescence was minimal. In Fig. 1a, three Raman spectra of different inks are presented illustrating the similarity of Raman features and, thus, indicating the need for chemometric data evaluation.

Fig. 1
figure 1

A Raman spectra of three different ink samples, B Details of LIBS spectra of the ten different ink samples (Cu I 327.396 nm)

LIBS, on the other hand, is an elemental spectroscopy allowing for the detection of emission from excited atoms and ions. In Fig. S1 (cf. Electronic Supplementary Material), the averaged spectra of ten ink samples are illustrated, showing the similarity of elemental compositions of the inks. Except for Cu and Ti, the elements K, Ca, Na, Li, and Al were present in all the inks differing only in the amounts. Figure 1b displays the copper line from the LIBS spectra of ten ink samples which are representative for different concentrations of Cu, originating from the copper-containing phthalocyanine.

Statistical analysis

At first, the data matrix obtained from Raman and/or LIBS line intensities of the ink samples were subjected to PCA to have an overview of the data. PCA is the basic tool for data analysis that simultaneously provides a visual representation of relationships between samples and variables as well as insights into sample homogeneities and heterogeneities. In our case, all individual measurements on inks are referred to as “samples”, while the spectral intensities taken at each particular wavelength (or frequency) are referred to as “variables”.

Raman spectroscopy (PCA, SIMCA, PLS-DA)

The Raman spectra in the 300–1,800 cm−1 range were subjected to PCA using nine principal components. The information retained in the first five principal components (PCs) amounted to 46%, 27%, 12%, 8%, and 2%, i.e., 96% of the total variance of the data was explained (Fig. 2a). The influence plot in Fig. 2b reveals that no data points are found with high leverage and/or high residual X-variance indicating the absence of multivariate outliers. The loadings in Fig. 2c identify the spectral range from 1,100 to 1,650 cm−1 to contain most of the distinguishing features. Here, typical pigments, such as Victoria blue, rhodamine, methyl violet, and copper phthalocyanine, contribute to Raman vibrations [5, 6, 29]. The scores plotted in Fig. 2d–f illustrate the variance between the individual measurements of each particular ink and provide a measure for the clustering of inks. It can be seen that inks E, D, G, and J are well separated by the plots of PC2, PC3 and PC4 vs. PC1. Some inks, however, form clusters which are not satisfactorily separated such as inks A–F, B–C, and H–K, respectively.

Fig. 2
figure 2

PCA of Raman spectra. Data pretreatment: background correction and unit-vector normalization. A Explanation of X-variance, B influence plot showing residual X-variance vs. leverage, C loadings of PC1–PC5, numbers refer to PCs, DF scores plots PC1 vs. PC2, PC1 vs. PC3 and PC1 vs. PC4. Ink samples are denoted from A to K

In the second step, pattern recognition analysis was carried out by SIMCA in order to establish classification rules for ink separation. SIMCA was applied to the same data matrix that was previously used for PCA. The ten individual PCA models obtained were based on a normal range, which is characterized by the 5% significance level for the critical distance. Based on the obtained classification table (cf. Electronic Supplementary Material Table S1), SIMCA has correctly classified inks B, D, E, and J. Wrong classification was obtained pair-wisely for the ink samples B–C, H–K, and A–F. Further classification of the questioned ink sample pairs was performed by a binary PLS-DA. PLS-DA models both the X and Y matrices simultaneously to find the latent variables in X with best prediction of the latent variables in Y. Here, the X matrix for the sample pairs is identical to the data matrix used for PCA. However, the Y matrix was generated for the above-mentioned pairs by choosing the values 0 and 1 for the particular inks.

The coefficients of determination (R 2) and root mean square errors of the calibration and prediction (RMSEC, RMSEP) were used to judge the success and accuracy of the models (Electronic Supplementary Material Table S2). RMSEC and RMSEP present the square root of the residual variances divided by the number of samples. The predicted values of the three questioned ink pairs are illustrated in Fig. 3. All ink pairs could be separated by the PLS-DA-model, however, only for the inks H and K, the predicted values within the corresponding standard deviations are compliant with the theoretically expected values of “0” and “1”.

Fig. 3
figure 3

Raman spectra: separation of samples pairs (indicated by capital letters) based on PLS models

LIBS (PCA, SIMCA)

LIBS spectra were analyzed in a similar fashion as the Raman spectra. The first five PCs modeled 97% of the spectral data. The loadings in Fig. S2 (cf. Electronic Supplementary Material) show that all spectral lines contained in the full spectral range of 300–900 nm contribute to the separation power. However, the loadings of the PCs vary strongly depending on the lines. For example, titanium lines around 500 nm occur in PC3 but are almost absent in PC4. In general, PCA of LIBS spectra yielded less discrimination power in comparison to Raman spectroscopy. However, in contrast to the evaluation of the Raman spectra, particular ink samples (e.g., B and C) could be successfully distinguished, as displayed in the scores plot in Fig. S2b, c (cf. Electronic Supplementary Material).

SIMCA classification at the 5% significance level (cf. Electronic Supplementary Material Table S3) achieved the identification of inks B, C, and D, but no clear classification of inks G-H-A-F, and inks J–E and K-H-G. The omnipresence of the same elemental lines in most of the inks explains why, in contrast to the molecular composition of inks, no proper classification or separation of inks can be obtained on the basis of LIBS-spectra only [5, 6, 29].

Because LIBS spectra alone were found to be insufficient for a proper classification of inks, the evaluation of the combined Raman and LIBS spectra was carried out to improve the discrimination power for the sample set investigated here.

To compare separate and combined data evaluations, the fused Raman-LIBS data were subjected to multivariate data analysis. The loadings of the PCs with higher LIBS fraction resulted in separation which was more similar to the separate evaluation of LIBS data. Different PCs describe variances in LIBS and Raman spectra, thus corresponding to the separation power of either LIBS or Raman spectroscopy. Taken together, they combine the benefits of both the techniques. In general, the fusion of separately normalized LIBS–Raman spectra automatically exhibits the higher influence of Raman spectra (cf. loading plot in Fig. 4a), which have the higher discrimination powers shown before. The score plots for PC1 vs. PC2 show that D, E, and G form uniform clusters, whereas H–K, B–C, and A-F-J form heterogeneous clusters (Fig. 4b). Comparison of PC1 with PC3 provides a separation of inks J and G (Fig. 4c), and finally, PC4 contributes to the separation of inks B and C (Fig. 4d).

Fig. 4
figure 4

PCA of fused LIBS-Raman spectra. Data pretreatment: unit-vector normalization. A Loadings PC1–PC6 which explain of 38%, 33%, 10%, 9%, 6%, and 2% of X-variance. Numbers refer to PCs. BD Score plots for PC1 vs. PC2, PC1 vs. PC3, and PC1 vs. PC4

Applying SIMCA, the contribution from LIBS spectra resulted in improved classification, where, at the 5% significance level, only ink pairs A–F and H–K needed further evaluation (cf. Electronic Supplementary Material Table S4).

In analogy to Raman spectra, PLS-DA of fused LIBS–Raman spectra was performed for those pairs under question. RMSE and coefficients of determination are summarized in Table S5 (cf. Electronic Supplementary Material), and the predicted values for ink pairs A–F and H–K based on the calculated PLS-DA models are illustrated in Fig. S3 (cf. Electronic Supplementary Material). As one sees, the data fusion improves the classification because now, in contrast to Raman spectra alone, all samples H–K were correctly classified, and only two deviations of predicted values from theoretical values were observed for ink pair A–F.

SVM

The pattern recognition method SVM is used frequently for spectral regression purposes, especially when nonlinear relationships have to be modeled [30]. Support vector machines offer the advantage of unique classification results, that means, in contrast to SIMCA, multiple classifications are excluded. Due to massive data reduction, SVM needed only short calculation time of usually a few seconds in contrast to several minutes for PCA and even longer for SIMCA (depending on the size of the data set), which is in agreement with previously published results [31]. The results of SVM are summarized in matching matrices which are presented in Table S6 (cf. Electronic Supplementary Material). For separate evaluation of LIBS and Raman data, SVM yielded a correct classification with the 87% and 92% rates, respectively. Further improvement to 97% was achieved by merging the LIBS and Raman datasets. However, whereas, by PLS-DA of fused LIBS–Raman, data yielded not only improved separation power but also increased validation accuracy, the latter is not the case for SVM, where cross-validation accuracy for Raman, LIBS, and combined Raman–LIBS data amounted to 90%, 81%, and 88%, respectively.

Summary

LIBS and Raman spectroscopic techniques offer a clear advantage over chromatographic techniques: the possibility of direct measurements on documents without sample preparation steps. However, in certain cases, no distinct classification or identification could be achieved if Raman or LIBS spectra were used separately.

Therefore, we combined data from Raman spectroscopy, which yielded information on the molecular sample composition, with data from LIBS, which comprised information on the elemental composition, to improve the classification. Our results demonstrate that the combination of the two complementary spectroscopic techniques indeed enhances the amount of correct classifications among the investigated ink samples. The advantage of using merged instead of separate LIBS–Raman data was demonstrated with several independent chemometric methods. The methods of PCA, SIMCA, and PLS-DA or SVM yielded comparable results for all the data sets analyzed. PCA alone provided the information on data clustering, significant spectral regions, and outliers. A hierarchical approach using SIMCA and PLS-DA allowed for the step-wise classification of data and separation of inks that were not identified by PCA. This approach may also be extended when it comes to classification and individualization of a questioned ink from a database.

The basic advantage of SVM is the reduced calculation time of usually a few seconds in contrast to several minutes for PCA and even longer for SIMCA (depending on the size of the data set). We therefore suggest this approach as a screening tool which provides a quick overview of data sets.

To summarize it briefly, our results demonstrate the benefit of the combined data treatment in hyphenated spectroscopy.