Introduction

Oil spills affect the environmental conditions of ecosystems, seriously threatening the overall life cycle of every species. To enforce liability actions for oil lumps, both identification of the source of the spilled hydrocarbon and monitoring of its fate are required. However, the large variability in the composition of crude oils and the complex ageing processes oil suffers in nature, especially at sea, make identification of the most probable source of a spillage a difficult problem. The US Coast Guard Research and Development Center pioneered the first systematic analytical procedure after increased social environmental awareness and resulting regulations [1, 2]. In the last four decades, abundant literature on the fate, effects, and sources of spilled oils and petroleum products has been published [310]. The Nordic countries have collectively developed the so-called “Nordtest protocol” that used chemical fingerprinting techniques introduced in 1983 [11]. Since 1991 the Nordtest method on oil spill identification constituted a de facto international standard for oil spill identification [12, 13].

Further efforts were made on a European scale (the “Eurocrude” project, European crude oil identification system) to establish an oil-spill identification procedure based on proprietary software [14]. Its analytical procedure, similar to the Nortest method, uses a sample to identify the pollution culprit. In 2002, Nordtest revised its methodology [12, 13] to provide a European standard for oil spill identification [15]. These analytical methods are very similar to the US-ASTM counterparts [16, 17] and the US-EPA GC-based techniques, which are intended to measure industrial chemicals [18, 19]. Modifications of these methods have been incorporated in multiple oil spill characterization procedures used over the past 40 years [7].

In essence, all analytical procedures rely on direct comparison of the spillage to several suspected sources [12, 13, 15] or to large databases of oil samples, which in turn are created by measuring hundreds of chromatographic and/or spectrometric variables. They focus on the molecular selectivity yielded by gas chromatography [7], either with flame-ionization detection (GC–FID) [2022] or mass spectrometric detection [2327]. Then, identification is based on tiered visual comparisons of chromatograms, n-alkane distributions, bar charts of PAH concentration, and double plots of diagnostic ratios among the chromatograms of the suspected sources and the study sample [12, 13, 15]. Main disadvantages are that the procedures are time-consuming (ca 90 min per injection), require highly skilful personnel and both interpretation of the results and conclusions based on visual comparison suffer from some subjectivity. The difficulties encountered in the matching process are reflected in the four results that can be reported when a sample is compared with one or several suspected source oils: “match”, “non-match”, “probable match”, and “inconclusive”.

Multivariate chemometric methods have rarely been applied to oil fingerprinting. Pasadakis et al. [28] discriminated among samples of petroleum spills and identified the refinery fractions that existed in the spills by using principal-component analysis (PCA) and k-means clustering. Lavine et al. [29] implemented a genetic algorithm for pattern recognition of fuel oil GC data. Doble et al. [30] used artificial neural networks to recognize premium and regular gasolines from GC–MS data. Urdal et al. [31] used four statistical methods to separate weathered crude oils on the basis of their geographic origin. Gaines et al. [32] used PCA to reduce the number of GC–MS peak ratios to differentiate among diesel fuel samples. In addition, several pattern-recognition methods were used to evaluate the origin of oils. Fonseca et al. [33] used Kohonen self-organizing maps (SOMs) to classify samples of crude oils on the basis of GC–MS descriptors in terms of geographical origin. Borges et al. [34] used unsupervised SOMs with a consensus criterion to classify weathered crude oil samples. Fernández-Varela et al. used MOLMAP [6] and Procrustes rotation (a variable selection technique) combined with PCA, cluster analysis, and self-organizing maps [35] to study PAHs (polycyclic aromatic hydrocarbons) associated with oil spills. Grueiro-Noche et al. [36] used three-way analysis (catenated-PCA, matrix-augmented principal-components analysis, parallel factor analysis, and Procrustes rotation) to assess the major sources of PAHs in seawater. Multiway methods were also used by Christensen et al. [37] and Arancibia et al. [38] to screen oil samples by excitation–emission fluorescence and phosphorescence, respectively. Zorzetti and Harynuk [39] estimated exposure time on the basis of the composition of a weathered sample of gasoline as observed by two-dimensional gas chromatography (GC × GC), and partial least-squares (PLS), nonlinear PLS (PolyPLS), and locally weighted regression (LWR). Finally, Lobão et al. [40] identified the source of a marine oil-spill by using geochemical and chemometric techniques, for example PCA and hierarchical component analysis (HCA).

In this work, multivariate supervised pattern-recognition is proposed as an alternative to visual comparison of chromatograms in order to assign an unknown sample to its most probable source. The multivariate approach consists of a variable-selection step using the selectivity ratio index (SRI) followed by partial least-squares discriminant analysis (PLS-DA). A “matching decision diagram” is derived that is used to designate the samples as “match”, “non-match”, “probable match”, and “inconclusive”. The procedure was used to ascertain whether a set of unknown oil lumps appearing on the NW Galician coastline (NW Spain) originated from a suspected source oil, in particular from the tanker Prestige–Nassau. The 153 analytical variables were the relative areas of 121 compounds measured by GC–FID and GC–MS, with 32 diagnostic ratios calculated among selected molecules. Classification by use of PLS-DA-based methodology was validated by comparison with standard procedures [12, 13, 15].

Materials and methods

Apparatus

Gas chromatography–flame-ionization detection

An HP 6890 instrument (Agilent Technologies, Palo Alto, CA, USA) with a split/splitless injector, a flame-ionization detector, and an HP-5 fused-silica high-resolution capillary column (J&W Scientific, Folsom, CA, USA) 30 m long × 0.25 mm i.d., 0.25 μm film thickness were used. The analysis followed standard procedures [12, 13] and in-house statistical studies were conducted to validate the method [35]. Operating conditions were: starting oven temperature, 40 °C, held isothermally for 5 min, and raised to 310 °C at 6 ° min−1 and held isothermally for 30 min. Carrier gas: helium, 2 mL min−1 constant flow. Injector and detector temperatures were 275 °C and 325 °C, respectively. The detector was fueled with 360 mL min−1 air flow and 30 mL min−1 hydrogen flow. Injection was performed in the splitless mode, injected sample volume was 1 μL. In total, 32 compounds (from n-C10 to n-C40, plus pristane and phytane) and four different ratios (n-C17/pristane, n-C18/phytane, n-C17/n-C18, pristane/phytane) were considered.

Gas chromatography–mass spectrometry

An HP 6890 instrument (Agilent Technologies) with a pulsed splitless injector, an HP 5973 mass spectrometry detector, and an HP-5MS fused silica capillary column (J&W Scientific) 60 m long × 0.25 mm i.d., 0.25 μm film thickness, were used. The m/z range for MS analysis was 40–440 amu. The analytical procedure was implemented by following SINTEF’s recommendations [13] after a detailed study of the robustness of the analytical conditions [41]. Operating conditions were: starting oven temperature, 40 °C, held isothermally for 1 min, and raised to 300 °C at 6 ° min−1 and held isothermally for 30 min. Carrier gas: helium, 1 mL min−1 constant flow. Injector and transfer line temperatures were 300 °C and 280 °C, respectively. Ionization energy: 70 eV, ion-source temperature, 230 °C. Injection was performed in the pulsed splitless mode, injected sample volume was 1 μL. SIM (Selected Ion Monitoring) mode was used throughout. The US-EPA target polycyclic aromatic hydrocarbons (PAHs) and, in particular, the petroleum-specific alkylated (C1–C4) homologues of selected PAHs (the decahydronaphthalene, naphthalene, benzothiophene, fluorene, dibenzothiophene, phenanthrene, fluoranthene, and chrysene series) were analyzed [12, 13, 15]. In total, 50 PAHs were analyzed. These are detailed in Table S1 of the Electronic Supplementary Material.

In total, 39 biomarkers were considered: 19 hopanes, 13 steranes and diasteranes, and seven triaromatic steroids (Table S2, Electronic Supplementary Material). In total, 28 diagnostic ratios were calculated, as is conventional [13, 15]; these are reported in Table S3 of the Electronic Supplementary Material. Note that because we are dealing with a regulated issue, the analytical procedures must adhere to official guidelines [11, 15] and, thus, only minor adjustments to achieve optimization are allowed. We could not, therefore, perform every analysis using unique GC–MS equipment, which is technically possible.

Samples

Two types of sample were considered:

  1. 1.

    Controlled spillages of six types of oil in special containers whose weathering processes were monitored over time. These were four crude oils (Ashtart, Brent, Maya, and Sahara Blend), a “marine fuel oil” (IFO), and the original fuel oil from the tanker Prestige. The original products without weathering and the weathered samples were analyzed. More details on controlled spillages can be found elsewhere [6, 35]. In total, 102 samples were obtained; 75 % of these (13 samples from each of the six oils) were used as a training set and the other samples (four from each oil) were used as the validation set. The most weathered samples were all included in the training set so that all the validation samples were bracketed by those in the training set.

  2. 2.

    Samples taken from 45 oil lumps beached on the Galician shoreline. These will be referred to as “beaches” and were studied by the chromatographic tiered standard procedures. They were taken after a major accident in the “Galician international corridor for hazardous goods”. This is among the most important routes for oil tankers worldwide. Most carriers transporting oil cargo from Central America, the Persian Gulf, and Africa to refineries in Northern Europe navigate throughout the Fisterra Cap (Galicia, NW Spain), only 25 miles off the shoreline. Accidental oil spills from tankers, accidental ballast release, and residues from ship bilge cleanup cause chronic pollution by hydrocarbons which damages the marine environment. The assigned source of the samples is shown in Table 1 as “Prestige” (positive match to the Prestige source), “No Prestige” (non-match to that source), “probable match”, and “inconclusive” (no decision can be drawn), in accordance with normal terminology.

    Table 1 Description of the samples taken from the Galician shoreline and their classifications by use of PLS-DA with and without variable selection

The chromatograms for all samples were reviewed manually to ensure correct integration for the target compounds. The integrated areas were then exported, by use of the chromatographs’ proprietary software, to a binary format, which can be read by any common spreadsheet application. The statistical studies were made using the PLS_Toolbox 6.0, under Matlab R2010a. The SRI subroutine was written in-house.

Partial least-squares discriminant analysis

Partial least-squares (PLS) regression is a multivariate latent-variable method that relates one dependent variable or predictand (y) to a set of independent variables, or predictors, X. Partial least-squares discriminant analysis (PLS-DA) is the application of PLS to classification of problems in which y is a vector that codifies the class of each sample. The class label of an unknown sample is decided on the basis of the y value predicted by the PLS model. Ideally, the predicted y should be close to the coded class values (either 0 or 1 in this work). In practice, it is a real number and different approaches can be used to convert the predicted y into a class label [4245].

In this work, PLS-DA was used as a class-modeling technique. The objective was to discover a set of latent variables that modeled the class of interest, here the Prestige samples, while maximizing discrimination from the other classes [46]. The class limits were defined both from the y-predictions of the PLS-DA model and from the sum of the x-residuals (Q-value or Q-residual, which describes the non-modeled part of the data). The limits for the y-predictions, were set at mean + 3SD, where SD is the standard deviation of the cross-validated predictions for the samples of the class of interest in the training set. The Q-residual for each sample was calculated as the sum of the squared differences between the preprocessed X-data and the X-data predicted by the PLS-DA model for the selected number of factors. An upper limit for the Q-value was calculated from the standard deviation of the Q-residuals of the target class. The joint use of the y-cut-off value and the upper Q-limit enabled use of an ad-hoc diagram to take decisions that emulates the four conclusions that may arise from the standard procedures. A sample with unusual characteristics will produce either a extreme y-prediction, a large Q-value, or both, and so its classification as belonging to the target class Prestige might be questioned (i.e., “probable match” or “inconclusive”). Use of the diagram is discussed in the section “PLS-DA classification of oil lumps”, below.

Variable selection with the selectivity ratio index

PLS-DA models were improved by selecting the variables with the highest discriminant power. The optimum variable subset will, hopefully, associate specific variables with specific groups of samples, improve understanding and/or interpretation of classification patterns, and reduce the risk of misclassification as a result of noise and redundant data [47]. In the work discussed in this paper the selectivity ratio index (SRI) was used [48, 49]. The SRI is defined as the ratio of the explained variance (v ex,p ) to the residual variance (v res,p ) of a variable, p:

$$ S{RI_p} = {v_{{\rm{ex}},p}}/{v_{{\rm{res}},p}} $$
(1)

For a given PLS-DA model, a target projection model is calculated as:

$$ {{\bf X}} = {{{\bf t}}_{\rm{TP}}}{{\bf p}}_{\rm{TP}}^{\rm{T}} + {{{\bf E}}_{\rm{TP}}} = {{{\bf X}}_{\rm{TP}}} + {{{\bf E}}_{\rm{TP}}} $$
(2)

where t TP (N × 1) are the target-projected scores (N = number of samples) and p TP (P × 1) are the target-projected loadings (P = number of variables). These are obtained as:

$$ {{{\bf t}}_{\rm{TP}}} = {{\bf X}}{{{\bf b}}_{\rm{PLS}}}/\left| {\left| {{{{\bf b}}_{\rm{PLS}}}} \right|} \right| $$
(3)
$$ {{\bf p}}_{\rm{TP}}^{\rm{T}} = {{\bf t}}_{\rm{TP}}^{\rm{T}}{{\bf X}}/\left( {{{\bf t}}_{\rm{TP}}^{\rm{T}}{{{\bf t}}_{\rm{TP}}}} \right) $$
(4)

where b PLS(P × 1) are the regression coefficients of the PLS-DA model calculated for A factors. From Eq. (2), the explained variance for variable p, v ex,p , is calculated from the pth column of X TP and the residual variance for variable p, v res,p , is calculated from the pth column of E TP.

A vector of SRI values (for each PLS model) is made so that all variables can be compared graphically. A high SRI means that the variable has a high discriminative power to separate the groups considered in the PLS-DA model.

Results and discussion

Preliminary PLS-DA with all the variables

A preliminary “Prestige vs all (Ashtart, Brent, Maya, Sahara Blend and IFO)” PLS-DA model was developed with the samples from the controlled spillages, considering the 153 chromatographic (autoscaled) variables. The model failed to discriminate Prestige samples from the rest, and most samples from the beaches could not be classified reliably because they had very large Q-residuals (Table 1). Subsequent “one class vs one class” PLS-DA models revealed that discriminating between Prestige and IFO was the most difficult challenge. The reason is that the IFO blend was very similar to the Prestige oil. Hence, a PLS-DA model was developed to discriminate between Prestige (coded as “1”) and IFO (coded as “0”) samples. The traditional RMSE (Root Mean Square Error) obtained by leave-one-out cross-validation (LOO-CV) was used to decide on the optimum number of latent variables (LV). Figure 1 shows the PLS-DA sample scores for the first and second latent variables (which explained 68.7 % of the variance of X and 99.4 % of the variance of y). The IFO samples could be clearly distinguished from the Prestige samples. However, Ashtart, Brent, Maya, and Sahara oils samples projected on to the latent variables overlapped with the IFO and Prestige samples. The clear trend of the Sahara samples is noteworthy because it is related to the weathering process. Hence, simultaneous smaller scores in LV1 and larger scores in LV2 denote the most weathered samples. This trend could be visualized for this product because it had the highest content of the most volatile compounds among all the products, thus leading to the fastest evaporation among them. This is evidence that the weathering process largely affects the composition of the original sample and, thus, its identification is difficult unless these variations are minimized by variable selection. Again, similar to the “Prestige vs all” model, a large number of validation samples and beaches had extremely large Q-residuals, mainly because of an excess of variables unrelated to the class being studied.

Fig. 1
figure 1

PLS-DA scores plot considering all 153 chromatographic variables. Empty symbols denote training samples whereas filled symbols refer to validation samples

PLS-DA model with variable selection

Variable selection using SRI was performed to avoid the overlap among the samples in the IFO-Prestige model. To discover the most discriminant chromatographic variables, a PLS-DA model was calculated for each of the fifteen combinations of two products (Ashtart–Brent, Asthart–Maya, etc.). In these models, a single latent variable coped with most of the information (55–70 % of X and 94–99 % of y) so the SRI value of each chromatographic variable was calculated for each of the 15 models developed with only one latent variable.

To make the SRI values comparable among the different models, the (1 × 153) vector of SRIs from each model (see the section “Variable selection with the selectivity ratio index”) was divided by its maximum. Figure 2 shows the SRI value of each variable summed over all the models. To the best of our knowledge, there is no unique criterion for selection of the optimum number of variables to be retained and many different options exist [4951]. In this work, sets of variables containing different numbers of variables were used to develop PLS-DA models. Each set was associated with a percentile of the maximum SRI value; for instance, two variables corresponded to the 95th percentile of the SRIs whereas 18 variables corresponded to the 60th percentile. We selected 12 variables (ca 68th percentile, the upper third of the SRI values) because they yielded good separation between the classes and good predictions. Fewer and/or more variables made the classes overlap in the training step. The selected variables were: an aliphatic hydrocarbon: n-C18, the D4 aromatic PAH, six biomarkers, and four diagnostic ratios.

Fig. 2
figure 2

Selectivity ratio index for all the 153 variables along all models. The 12 variables with the largest ratios were selected (they corresponded to the labeled bars)

The n-C18 aliphatic hydrocarbon was selected from the 32 initial ones (plus four additional ratios which are usually considered for weathering studies). This compound is an intermediate linear chain, relatively stable through weathering processes, which is very useful for differentiating between heavy products (IFO and Prestige, relative areas approximately 0.55–0.85 when normalized against n-C25) and crude oils (relative areas ranging from 1.6–2.4). This reflected the different composition of the lightest fractions amongst crude oils and refined products.

The PAH that helped most to differentiate among the classes was C4-dibenzothiophene (abbreviation “D4”), a polycyclic aromatic sulfur heterocycle with four methyl substituents and which corresponds to the differences between the total sulfur content of the oils, for example: Prestige, 2.28 %; Ashtart, 1 %; and Sahara, 0.05 %.

Six biomarkers (out of the 40 initial ones) were included in the reduced subset, namely, 27Ts, 29Ts, 29bbS(217), 28bbR(218), 29bbR(218), and 29bbS(218). They correspond to two hopanes and four steranes and diasteranes.

Four diagnostic ratios were selected from the 28 initial ones: %27Ts, %29Ts, %D2/P2, and %D3/P3. Interestingly, the first two were calculated between the hopanes biomarkers that had been selected above whereas %D2/P2 and %D3/P3 correspond to ratios between PAHs (phenanthrenes and dibenzothiophenes). It is also worth noting that the two PAH ratios had been proposed previously as useful indicators to match spillages [5] and were used to describe oil depletion and to differentiate two oils and to correlate spilled-oil sediments with a source oil [10]. They had also been used to differentiate a high-sulfur heavy Iranian cargo crude from a low-sulfur pre-spill background [5]. Further, the suitability of %D2/P2 as source indicator was proposed in cases with only moderate degradation whereas %D3/P3 had been suggested in cases with severe weathering [5]. They, therefore, seem to be selected here to somehow differentiate between samples with moderate and severe weathering.

A PLS-DA model between Prestige and IFO was calculated with the 12 selected autoscaled variables. Figure 3 shows the scores for the first two LVs (94.6 % and 98.7 % of explained variance of X and y, respectively). The model did not overfit the data, because the RMSEC was 0.058 and the LOO-CV error was 0.069. The sensitivity (defined as the ability of the model to correctly recognize objects belonging to the gth class), specificity (defined as the capability of the gth class to reject objects of all other classes), and precision (defined as the ability of a classification model to not include objects of other classes in the considered class) were excellent (100 %) for both training and validation samples. A y cut-off value of 0.57 was calculated by using the average plus three times the standard deviation of the predictions for the Prestige class using LOO-CV. This limit was set as a separation between the Prestige samples and the samples of the other types of oil.

Fig. 3
figure 3

PLS-DA scores plot considering 12 variables selected by SRI. Empty symbols denote training samples whereas filled symbols refer to validation samples

Note that the six types of product do not overlap (Fig. 3) and that the percentage of X variance explained by the model in the X-block increased remarkably, ca. 26 %, compared with the model calculated with the 153 variables, emphasizing the benefit of the variable selection in this classification model. After variable selection by SRI the groups could be differentiated clearly along the first LV but with fewer analytical variables. The loadings on the first LV for the reduced subset of variables revealed an opposition between all the biomarkers (plus the two diagnostic ratios calculated with them, %27Ts and %29Ts) and the other variables: aliphatic and aromatic hydrocarbons (plus the two diagnostic ratios calculated with the PAHs: %D2/P2 and %D3/P3) Thus, the highest values for n-C18, D4, %D2/P2, and %D3/P3 (0.85, 5.80, 0.34, and 0.39, respectively) became associated with Prestige whereas the highest values for the biomarkers (0.21, 0.18, 0.18, 0.30, 0.29, 0.25, 0.34, and 0.15, for 27Ts, 29Ts, 29bbbS(217), 28bbR(218), 29bbR(218), 29bbS(218), %27Ts, and %29Ts, respectively) defined the IFO class (Table 2). The most important variable on LV2 was %D3/P3. This ratio is, thus, confirmed as a good marker for describing the most weathered samples of IFO and Prestige. In addition, %D2/P2 seemed more linked to intermediate weathered samples.

Table 2 Average values for the selected variables and each class of sample

The Ashtart, Brent, Maya, and Sahara classes were projected on the PLS-DA model (Fig. 3). The clear separation among the classes is evidence that the variables selected by SRI acted as good markers for these products also. Table 2 shows the average values found for the selected variables along the classes. It can be seen that, although it is possible to characterize some classes by a variable (e.g. Brent by D4, Maya by %D3/P3, Ashtart by 28bbR (218), and Sahara by %27Ts and D4), it is preferable to use the overall reduced set.

PLS-DA classification of oil lumps

The PLS-DA model was used to classify a set of samples taken on Galician beaches whose “target” assignations were obtained from chromatography-based standard procedures [12, 13, 15].

In contrast with the training and validation samples, the origin and time of weathering of which were controlled, the oil lumps from the coastline may have originated from different sources and suffered a range of weathering processes, because of their different drifting times at the sea and eventual mixture with other products. This may mean that even when a lump originated from the Prestige wreck, its chemical composition after a period of weathering might not be close to that of the original product (or to the samples studied after controlled weathering). This phenomenon would eventually be reflected by the Q-residuals and the y predictions of the PLS-DA model.

In order to adhere as much as possible to standardized procedures based on visual comparisons of the chromatograms, a “matching decision diagram” (Fig. 4), was set to mimic the terminology of the standard approach. A “match” (i.e. a sample identified positively as Prestige) occurred when PLS-DA classified a sample in class “1” (Prestige), with a low Q-residual. A “non-match” (i.e. not from the Prestige) occurred when PLS-DA classified a sample in class “0”, with a large Q-residual. This means that the sample contained variation not previously modeled by PLS-DA. A “probable match” was assigned when PLS-DA predicted a sample as class “1” but with a high Q-residual, and an “inconclusive” result was assigned when a sample did not fulfill any of the previous criteria (i.e., a classification of “0” but with no new characteristics and, so, a low Q-residual). The main separation between the classes is given by the cut-off value derived from the PLS-DA model (vertical dotted line in the figure). The upper Q-residual limit was set as the average Q-residuals of the Prestige samples on training (LOO-CV) plus 30 times their standard deviation.

Fig. 4
figure 4

Matching decision diagram. P, positive match, “Prestige”; NP, non-match, “not Prestige”; PM, probable match; NC, inconclusive. Vertical and horizontal lines indicate the class limits (see text)

The model thus obtained from the reduced set of variables, and two LVs, was highly satisfactory because it classified the validation samples and beach samples correctly. From inspection of Tables 1 and 3 it is apparent that the predictions for the 45 beach samples were more accurate than those obtained using the overall set of variables. Note that the PLS-DA method leads to more assignations than the visual comparison approach, because the former assigned only eight samples as “inconclusive” and “probable match”, whereas the latter had 18 samples in both categories. As a consequence, the number of probable matches derived from the standard method is higher (15 samples) than for the PLS-DA method (five samples). In addition, no samples considered by PLS-DA as Prestige or No Prestige were regarded as No Prestige or Prestige, respectively, by the standard approach, which was a very positive fact.

Table 3 Contingency table derived from the PLS-DA predictions compared with the assignments derived from standard procedures

The five beach samples regarded as probably originating from the Prestige wreck (probable match in the standard approach) had PLS predictions higher than the cut-off value of 0.576 in Fig. 4 but large Q-residuals. They were regarded as non-match by the standard approach. This was not surprising, because of their closeness to the “No Prestige” class in Fig. 4. On the other hand, sample B13 is, clearly, very close to the decision value and, indeed, all its experimental variables were quite similar to the average values for the Prestige samples. Three samples were regarded as “inconclusive” in Fig. 4 (Portiño4, Lira1, and Farolira3). They agreed with the standard approach, except for Farolira3, which was regarded as a probable match by the visual approach.

Close inspection of those two categories revealed that the “probable match” samples had, in general, measured values very close to, or slightly higher than, the average Prestige values. In contrast, “inconclusive” samples had clearly higher values (except for n-C18 and D4, which were clearly lower) than the Prestige values. For instance, 0.22 vs 0.17 (27Ts), 0.17 vs 0.13 (%29Ts), and 0.42 vs 0.39 (%D3/P3) for “inconclusive” and Prestige samples, respectively.

Regarding the samples assigned as “not from the Prestige” (non-match), all the PLS-DA classifications were correct except for a sample that the visual approach regarded as a probable match. Clear ordering was also observed in this class in Fig. 4, because the higher the Q-residual was, the higher the n-C18, %D2/P2, and %D3/P3 values were. For instance, values for n-C18 and %D2/P2 were 0.96 and 0.39, and 0.70 and 0.41 for Langosteira3 and Louro3 respectively.

Finally, it is worth noting that PLS-DA classified more samples as from the Prestige than the standard procedure, probably because of the large number of inconclusive and probable match assignments obtained in the standard approach (which, in turn, reflects the difficulty of decision-making solely by visual comparison). This is not really a critical drawback, because in liability studies false positives would be preferred to false negatives (because more detailed studies should clarify the false assignment).

Conclusions

In the work discussed in this paper an objective procedure was implemented to match an oil spillage with a suspected source. It is based on PLS-DA and was exemplified by a case study in which 45 oil lumps of unknown origin discovered on Galician beaches were matched to a suspected source of the oil (the Prestige wreck). A preliminary variable-selection step using SRI yielded 12 variables, from 153 initial chromatographic variables. A PLS-DA model was then developed that led to a “matching decision diagram”. Two cut-off values were established by considering the PLS-DA predictions and their Q-residuals. The diagram leads to four possible decisions that may arise: match, non-match, probable match and inconclusive. In this particular study the PLS-DA model considered more samples as matched to the Prestige oil than the standard approach. This may be a consequence of the larger number of inconclusive statements of the latter approach. Nevertheless, this was not considered a serious drawback because in environmental liability studies “false positives” would be preferred to “false negatives”, because of their very different consequences. It is worth noting that the presented approach is not based on a one-to-one comparison of the unknown sample with every suspicious reference source and every weathered reference sample, as is performed in standard procedures (and currently involves a huge workload). Instead, it is based on comparison of the sample with a whole reference set that includes original and weathered reference samples, which defines the class boundaries of the suspicious source in multivariate space. Because the sample is now assigned to the class of the suspicious source if it is classified between the class limits, no exact match of the sample to a single reference is required. Therefore, the PLS-DA approach presented here seems an effective method of screening to reduce the laboratory workload.