Introduction

Oil fingerprinting is used to identify the original sources of oil and refinery products leaked to the environments (Bayona et al. 2015). Biomarkers, which are specific groups of petroleum hydrocarbons, are applied to identify reliable sources of oil spill, and to trace the fate and behaviors of spilled oil over time (Wang et al. 2006a). Aliphatic and aromatic biomarkers are most commonly used in offshore oil spill identification. Aliphatic biomarkers can be classified as those specific saturated hydrocarbons, including sesquiterpanes, adamantanes, and diamantanes, and those with higher molecular weights, such as steranes and terpanes. Aromatic biomarkers are normally referred to as TA-steranes and MA-steranes. Identification and differentiation of spilled oils can be realized through the comparison of the ratios of biomarkers in targeting oils. For examples, the distributions of n-alkanes varying from light oils to heavy oils indicated the type of crude oils, and the change of ratios of some alkylated-aromatic biomarkers (such as 1-methyl naphthalen and 2-methyl naphthalen) implied the influence of different weather processes (Fayad and Overton 1995). Many studies analyzed the diagnostic ratios of the biomarkers and have shown them to be efficient and effective tools for oil spill fingerprinting (Song et al. 2016; Wang et al. 2013; Prince et al. 2013).

In addition to the numerical analysis of the diagnostic index of biomarkers, multivariate analysis techniques, particularly, principal component analysis (PCA) has been introduced to fingerprint spilled oil (Kaufman et al. 1997). It is a powerful technique to differentiate among different oils, because oils with different distributions of detected hydrocarbons or their diagnostic indices can be separated into different components (Stout et al. 2001). PCA methods are widely used in the oil fingerprinting field to identify oils and their weathering status using the combination of component patterns of oil (Christensen et al. 2004; Prata et al. 2016). The diagnostic ratios of n-alkanes, terpanes, and steranes are effectively applied as variables in PCA to differentiate oil types, such as light and heavy fuel oils, diesel, lubricants, and crude oils (Sun et al. 2018; Ismail et al. 2016; Christensen et al. 2005). Weathering degrees of crude oils could be evaluated by the application of diagnostic ratios of biomarkers, such as diamondoids, sesquiterpanes, terpanes, steranes, and alkylated PAHs (Azevedo et al. 2008; Sun et al. 2015). Two to three principal components are commonly obtained. A biplot is then used to visualize these differences to assist in the interpretation the PCA results. Each vector represents the combinations of the contributions of two components. PCA can be combined with other statistical techniques and chemometric data analysis tools to decrease the possibility of making faulty decisions, such as discriminant analysis to maximize the distances among different categories, and warping methods to minimize noises from chromatograms (Christensen et al. 2005; Sun et al. 2015; Ismail et al. 2016; Tomasi et al. 2004). These techniques decrease the signal noise of instruments, statistically narrowing down the differentiation processes, and directly increase the validation accuracy of oil fingerprinting.

With the widespread application of oil dispersants, oil is more commonly dispersed and will stay in seawater longer (Lessard and DeMarco 2000; Prince 2015). The use of dispersants can dramatically change the physiochemical properties of spilled oil (Swannell and Daniel 1999; Macnaughton et al. 2003). The oil droplets decrease the proportions of oil contacting with sunlight and air, which may affect the evaporation and photo-oxidation processes (Zhuang et al. 2016). The increased opportunity of the contact of oil with bacteria changes the degree of enhancement of biodegradation in the ocean (Brakstad et al. 2015). These variations make the distributions of biomarkers of a non-dispersed oil different from the dispersed oil. The characterization of dispersed oil and tracing the fate and behaviors of dispersed oil using current biomarkers thus become challenging.

Possible candidate biomarkers for fingerprinting of different CDO have been investigated through some experiments (Song et al. 2016; Song et al. 2018; Olson et al. 2017). However, whether the application of dispersants can affect the weathering of biomarkers is unknown. As such, multivariate analysis methodologies, such as PCA, will play an inevitable role to objectively differentiate chemically dispersed oil (CDO) from weathered crude oil (WCO) or non-dispersed oil. To our knowledge, fingerprinting of CDO using PCA has not been reported yet. This paper mainly aims to differentiate CDO from WCO using multiple PCA algorithms based on the diagnostic ratios of 7 types of biomarkers, including adamantanes, diamantanes, sesquiterpanes, steranes, terpanes, TA-steranes, and MA-steranes.

Materials and methods

Oil-weathering experiments and data collection

Based on our previous results from a long-term (1, 10, 20, 30, 40, 50,and 60 days of weathering) general weathering of dispersed oil and crude oil (Song et al. 2018), 8 types of biomarkers were selected to differentiate CDO from WCO Briefly, the experiments could be summarized as follows. Three types of oil samples were prepared: (1) crude oil samples: crude oil samples were prepared by dissolving crude oil in hexane, (2) CDO: aliquot 100 μL crude oil was pipetted to artificial seawater with following addition of 10 μL dispersant (Corexit 9500A); and (3) WCO; aliquot 100 μL crude oil without dispersant was pipetted into artificial seawater. CDO and WCO were shaked at 120 rpm for specific days to simulate oil weathering.

CDO and WCO samples were extracted for sample analysis when the weathering process has completed (Song et al. 2016, 2018). The extraction into the organic phase was accomplished using DCM. The extracts were cleaned and eluted using a chromatographic column filled with silica gel. The organic phase was concentrated and analyzed using a GC-MS (Agilent model 6890) equipped with a DB-5 ms capillary column (30 m) (Song et al. 2016, 2018). The validity and reliability of the experiment were evaluated using QA/QC programs. All the weathering simulations, sample pre-treatments, and sample analyses were conducted in duplicate. Each detectable biomarker thus has 8 data of peak areas using GC-MS analysis (some biomarkers, especially light-molecular ones, are undetectable so each of them has less than 8 data). Calibrated surrogates were introduced to sample preparation to ensure the validity of sample treatment. Internal standards were applied to monitoring the stability of GC-MS system.

Eight types of biomarkers, containing adamantanes, diamantanes, sesquiterpanes, terpanes, steranes, TA-steranes, MA-steranes, and alkylated PAHs, were selected. The peak areas of identified biomarkers in each sample (crude oil, CDO, and WCO samples) were calculated. More than 100 diagnostic ratios were calculated based on their peak areas shown in Table S1. The diagnostic ratios included the ratios from the same types of biomarkers (e.g., Ts/Tm, C29/C30: terpanes/terpanes) and the ratios from two types of biomarkers (e.g., Ts/C27S, TR28a/C29αββR: terpanes/steranes). The average values of diagnostic ratios of two individual biomarkers were obtained through the ratios of peak areas. The average values of diagnostic ratios were set as variables to evaluate the effects of the application of dispersants and weathering duration on selected biomarkers, respectively. Weathering days (1–60 days) of CDO were abbreviated as C1-C60, and W1-W60 were used to represent weathering days (1–60 days) of WCO samples. The abbreviations of the diagnostic ratios are shown in Table S2.

Principal component analysis

PCA is a widely recognized multivariate analysis technique that uses orthogonal transformation to convert the variables of original data into uncorrelated variables. PCA extracts eigenvalues and eigenvectors from the covariance of original correlated variables to a new smaller set of independent uncorrelated variables (principal components) (Tipping and Bishop 1999; Wold et al. 1987; Jeffers 1967; Singh et al. 2004). The principal components zi’s are weighted by the combinations of original variables with eigenvectors as shown in Eq. (1):

$$ \left(\begin{array}{c}{\mathrm{z}}_1={\upalpha}_{11}^{\prime }{\mathrm{x}}_1+{\upalpha}_{12}^{\prime }{\mathrm{x}}_2+\dots +{\upalpha}_{1\mathrm{j}}^{\prime }{\mathrm{x}}_{\mathrm{j}}\\ {}{\mathrm{z}}_2={\upalpha}_{21}^{\prime }{\mathrm{x}}_1+{\upalpha}_{22}^{\prime }{\mathrm{x}}_2+\dots +{\upalpha}_{2\mathrm{j}}^{\prime }{\mathrm{x}}_{\mathrm{j}}\\ {}\dots \\ {}{\mathrm{z}}_{\mathrm{i}}={\upalpha}_{\mathrm{i}1}^{\prime }{\mathrm{x}}_1+{\upalpha}_{\mathrm{i}2}^{\prime }{\mathrm{x}}_2+\dots +{\upalpha}_{\mathrm{i}\mathrm{j}}^{\prime }{\mathrm{x}}_{\mathrm{j}}\end{array}\right) $$
(1)

Where, αi is the i th vector representing components loading, j donates the number of variables, and x denotes the variables.

Covariance was firstly employed to the data sets for measurement of linear correlation between 2 variables. Pearson correlation was then applied to exam the linear correlation of scaled variables derived from the original data. Other non-parametric correlation methods based on ranks of observations could also describe non-linear correlation to obtain eigenvalues and eigenvectors (Ma et al. 2010; Alberto et al. 2001). Two types of non-parametric correlation, Spearman ρ and Kendall τ, are thus employed in the data sets in case of non-linear association between 2 ordinal variables. They may be helpful with the variables with different and incomparable means in the same data set, such as variables containing the diagnostic ratios of terpanes and TA-steranes.

The PCA results were applied to assess the effects of the application of dispersants as well as the weathering duration on diagnostic ratios of biomarkers. Principal components (PCs) were set to cover at least 80% of variances using covariance, Pearson correlation, and non-parametric methods (Spearman and Kendall), respectively. The PCAs were performed using both Minitab 17 (Minitab Inc. 2017) and XSLTAT software, an Excel based software. Both software showed consistent results.

Pearson’s correlation coefficient can be calculated using Eq. 2:

$$ {\uprho}_{\mathrm{X},\mathrm{Y}}=\frac{\operatorname{cov}\left(\mathrm{x},\mathrm{y}\right)}{\upsigma_{\mathrm{X}}{\upsigma}_{\mathrm{Y}}}=\frac{\mathrm{E}\left[\left(\mathrm{X}-\upmu \mathrm{X}\right)\left(\mathrm{Y}-\upmu \mathrm{Y}\right)\right]}{\upsigma_{\mathrm{X}}{\upsigma}_{\mathrm{Y}}} $$
(2)

Where cov (X,Y) is the covariance of X and Y, σX is the standard deviation of X, σY is the standard deviation of Y.

Spearman correlation (rs) is approximately the Pearson correlation coefficient between ranked variables. If Spearman correlation is used, then X and Y are changed to the rank of X, and the rank of Y.

$$ {\mathrm{r}}_s={\uprho}_{\mathrm{X},\mathrm{Y}}=\frac{\operatorname{cov}\left({\mathrm{r}}_{\mathrm{x}},{\mathrm{r}}_{\mathrm{y}}\right)}{\upsigma_{{\mathrm{r}}_{\mathrm{x}}}{\upsigma}_{{\mathrm{r}}_{\mathrm{y}}}} $$
(3)

Where cov (rx, ry) is covariance of the ranked variables x and y, σ donates the standard deviations of the ranked variables.

Kendall τ is a reasonable coefficient to evaluate the concordance of ranked variables (Kendall 1948). If there are two set of ranked variables (A and B), one of the two ranks will be naturally re-ordered. The pair of ranked numbers in any two variables \( \left(\begin{array}{c}\mathrm{n}\\ {}2\end{array}\right) \) will be scored as right order (+ 1) or inverse order (− 1) based on the natural sequence. The scores in both ranks then are multiplied to reach a score, as concordance (positive scores, as C) or discordance (negative scores, as D).

$$ \uptau =\frac{\mathrm{C}-\mathrm{D}-\mathrm{Q}}{\frac{1}{2}\mathrm{n}\left(\mathrm{n}-1\right)} $$
(4)

Results and discussion

The effects of dispersants and weathering on low-molecular biomarkers

PCA was firstly applied to differentiate CDO from WCO using both the diagnostic ratios of adamantanes and the diagnostic ratios of diamantanes. PCA was conducted using the average values of the same diagnostic ratios selected on the same samples. Table S3 shows the Pearson matrix as an example of the correlation matrix. Table S4 and S5 show the eigenvectors and factor scores of Pearson matrix, respectively. The scores plots using the three PCA methods are displayed in Fig. 1a–c). Raw data are listed in Table S6. Slightly weathered CDO (1–20 days) are grouped with crude oil, and slightly weathered crude oil (1 day) according to experimental conditions associated with hierarchical cluster analysis (CA) shown in Fig. S1. Other WCO (10–20 days weathering) are clearly differentiated from the slight weathered CDO as well as CDO with a relatively longer weathering duration. The first component (PC1) explained 56–59% of total variances. The second component (PC2) presented 14–23% of total variances. The third component (PC3) presented 5–10% of total variances. The combination of PC1 to PC3 is sufficient to interpret the influence of weathering duration and the application of dispersants on the variations of diagnostic ratios. The diagnostic ratios of diamantanes and adamantanes can be applied to differentiate CDO, crude oil, and WCO as shown in Fig. 2. For example, in Pearson methods, the diagnostic ratios of Ad1, Dia1, Dia 4, and Dia 5 are weighted on relatively heavily weathered CDO (C30). Crude and relatively slightly weathered CDO and WCO are related to some diagnostic ratios, such as Dia 2 and 3, Dia 6, and Ad 9. The diagnostic ratios (Ad 2–6, Ad 13, and Ad 15) located near the corresponding oil are probably correlated to WCO. Meanwhile, some specific diagnostic ratios are always linked with unique oil samples reflecting the impacts of use of dispersants and weathering duration. For example, crude oil appeared in three PCA biplots are always correlated with Ad1, Dia 2, and Dia 3. Dia 4 and Dia 5 can trace CDO (C30 for Pearson and Spearman PCA, and C10 for Kendall PCA). WCO can always be differentiated using Ad3–6, Ad13, and Ad15. Some hydrocarbons in chemically dispersed oil have diverse resistances to weathering processes compared to those in non-dispersed (naturally-dispersed) oil sharing the same weathering conditions (Bacosa et al. 2015; Prince et al. 2013). Even in dispersed oil, hydrocarbon weathering highly linked to the size of oil droplets (Brakstad et al. 2015). Biomarkers in dispersed oil could perform variable and discordant degradation rate as well. The results in this study indicated that based on statistical analysis, the weathering degrees of biomarkers, especially the same types of biomarkers, after applying dispersants can be tracked. The differentiation of CDO from WCO implied that the addition of dispersants may attribute to the variations of degree and fate of weathering of diamondoids (C30 versus W10 and W20) besides weathering duration (C1 versus C30). Many studies (Daling et al. 2014; Bao et al. 2014) showed that the major weight loss of oil is caused by evaporation. The evaporation rate of dispersed oil can be slower than non-dispersed oil, because the water film provided by dispersants can distinct oil from the vapor phase (Aranberri et al. 2002). It is still unclear whether photo-oxidation significantly contributed to the differences of the first stage (0–10 days of weathering) and later stages of weathering (longer than 20 days). The results obtained from the PCA imply that adamantanes and diamantanes may be affected in two ways in different rates (Figs. 1 and 2).

Fig. 1
figure 1

PCA results using the diagnostic ratios of adamantanes and those of diamantanes using (a) Pearson, (b) Spearman, and (c) Kendall PCA

Fig. 2
figure 2

PCA biplot using the diagnostic ratios of adamantanes and those of diamantanes using (a) Pearson, (b) Spearman, and (c) Kendall PCA

Besides, if data points from CDO are connected using a curve following the general order of weathering days from 1 day to 60 days, the curve direction goes counterclockwise in Pearson PCA (green line in Fig. 1a). The direction for WCO is counterclockwise as well when the curve is drawn as the same sequence (orange line in Fig. 1a). The direction of data of CDO is the same direction as the direction of WCO (both counterclockwise and clockwise). This trend indicates the effects of weathering duration on the variation of biomarkers for CDO and WCO. Meanwhile, the curves of CDO and WCO located in different areas clearly implied the impacts of the use of dispersants in variations of biomarkers. However, the trends were not always found if all the data plots included, such as C20 in Pearson PCA (in Fig. 1a). One data plot in CDO or WCO sequence (1–60 days of weathering) at most was omitted to obtain a clearer trend towards weathering duration. The data plot in the middle of the weathering duration is primarily selected to be omitted, to clarify the effects of weathering duration. The same directions of the curves were found in Spearman PCA (Fig. 1b), but not in Kendall PCA (Fig. 1c). The different directions of curves may result from different PCA methods. Different ranking methods may result in diverse information loss related to the effects of weathering duration on the values of diagnostic ratios.

Meanwhile, PCA successfully differentiated CDO from WCO using the diagnostic ratios of adamantanes. From the loading plots, Crude, C1, and C20 are located in a similar zone. CDO with longer weathering duration (C30-C40) is clearly differentiated from WCO (W10–20). The PCA results from adamantanes may represent the application of dispersants as well as the effects of weathering days. Two PCs were selected, explaining 80% of the variance. The scores plots showed in Fig. 3 illustrated the isolation of W20 and C30. The data are listed in Table S7. The trend is concordant with identified clusters using CA (Fig. S2). Crude oil is grouped with W1 and C1 and C20, and C10 is grouped with C30 and C40 from CA results. W10 and W20 are differentiated from CA. Both CA and PCA could clarify the difference of diagnostic ratios between CDO and WCO as well as weathering duration. The PCA results using only diamantanes could also obtain similar results (Fig. 4) with values of diagnostic ratios (Table S8). Oil samples with longer weathering days are differentiated from other samples with relatively shorter weathering duration (0–20 days). Weathered non-dispersed oil are separated from weathered dispersed oil in all PCA methodologies. Contrast with the results using adamantanes as variables, W10 is classified as the group with slight weathered oil. The higher resistance to evaporation of diamantanes may lead to a lower variation of the diagnostic ratios of diamantanes compared with adamantanes (Wang et al. 2006b). The diagnostic ratios of ad1 and ad7 (the detailed ratios could be found in Table S2) are always correlated with CDO, while Ad3, Ad5, Ad6, and Ad15 were associated with WCO, compared with PCA results using both adamantanes and diamantanes. These indicators probably are key indicators for differentiation CDO from WCO using adamantanes (Figs. 3 and 4).

Fig. 3
figure 3

Differentiation of CDO from WCO and crude oil using adamantanes (explain the colored arrows)

Fig. 4
figure 4

Differentiation of CDO from WCO and crude oil using diamantanes

Two principal components should be sufficient for fingerprinting using diagnostic ratios of sesquiterpanes as observations (Fig. 5). The diagnostic ratios are listed in Table S9. PC2 involves longer weathering days (C40, W30 as well as C10). The assessment is similar to diamantanes. PCA clearly indicates the differences between long-term weathering and short-term weathering. Additionally, short-term weathering (less than 10 days) is identified using the diagnostic ratios of sesquiterpanes of CDO and WCO. The changes of the values of p3/p4 and p4/p5 may indicate the degree of weathering of CDO, while the degree of weathering of WCO is related to p3/p6 and p5/p10 (Fig. 5).

Fig. 5
figure 5

Differentiation of CDO from WCO and crude oil using sesquiterpanes

The curve connecting data plots of adamantanes with the order of weathering duration are similar in pattern to those displayed in Fig. 1. The directions of the curves for both CDO (green line) and WCO (orange line) are counterclockwise (Fig. 4). The same trend is observed in Fig. 5. The rotation of line from both CDO and WCO was clockwise using sesquiterpanes.

The effects of dispersants and weathering on high-molecular biomarkers

Only one principal component was obtained during PCA using the diagnostic ratios of steranes, terpanes, TA-steranes, and MA-steranes alone, respectively. Eighty percentage of the diagnostic ratios of these biomarkers have a relatively low RSD values (< 5%) (Song et al. 2018). The high recalcitrance of the biomarkers to weathering probably is the main reason of low variances. The slight difference of resistance to weathering of different types of biomarkers may be important to identify CDO from WCO. PCA is then conducted using the diagnostic ratios of steranes and terpanes using 4 PCA methodologies shown in Fig. 6a–d). The diagnostic ratios are given in Table S10. The PCA basically separated CDO (left zone) and WCO (right zone) into two zones. The duration of weathering of CDO and WCO is identified (anticlockwise) with only a discordance of data point using the covariance method. The weathering of different types of biomarkers may gradually be affected by the application of dispersant, but insignificantly influenced by weathering duration. The diagnostic ratios of steranes terpanes, TA-steranes, and MA-steranes are combined to operate PCA using four methods (Fig.7) with diagnostic ratios in Table S11. Four PCA methods accomplished the differentiation of CDO (left zone) from WCO (right zone). The PCA results of the diagnostic ratios of different types of biomarkers also could differentiate CDO from WCO as shown in Fig. 8 with data in Table S12. The duration of weathering of CDO and WCO is identified (anticlockwise) by the covariance method. Since the diagnostic ratios of the same types of biomarkers were stable, the difference between diagnostic ratios in CDO and WCO implied the influence of use of dispersant on weathering process of different types of biomarkers (Figs. 6, 7, and 8).

Fig. 6
figure 6

Differentiation of CDO from WCO by the diagnostic ratios of the combination of steranes and terpanes using (a) covariance, (b) Pearson, (c) Spearman, and (d) Kendall PCA

Fig. 7
figure 7

Differentiation of CDO from WCO by the diagnostic ratios of the combination of high-molecular aliphatic and aromatic biomarkers using (a) covariance, (b) Pearson, (c) Spearman, and (d) Kendall PCA

Fig. 8
figure 8

Differentiation of CDO from WCO using diagnostic ratios of two types of biomarkers (terpanes/steranes)

In addition, when data plots are linked using a curve, the counterclockwise trend is suitable for CDO and WCO using terpanes and steranes (Fig. 6a). But, the line cannot be drawn using other PCA methods. The available trend of the plots may be narrowed down to Pearson PCA. In terms of the combination of high-molecular aliphatic and aromatic biomarkers (Fig. 7a), the direction is clockwise when Pearson PCA was applied. The order become subtle when using other non-parametric methods. The omitted information may correlate to the effects of weathering duration on the variations of diagnostic ratios. Some secondary information is omitted during the ranking process. The impacts of weathering duration on diagnostic ratios are of secondary importance compared to the effects of application of dispersants.

Conclusion

CDO samples were differentiated from WCO samples using all the low-molecular biomarkers or combinations of high-molecular biomarkers by multiple PCA methods. The application of dispersants can affect the weathering fate of biomarkers to differentiate the weathering process of CDO from WCO. The differences of CDO and WCO samples were induced by the effects of weathering duration as well. The overall trend of weathering duration can be displayed in scores plots from PCA analyzes. Involved biomarkers play a paramount role for CDO differentiation. The results implied the diverse degrees of weathering of different types of biomarkers and reflected the importance and possibility of application of biomarkers to trace the behaviors of weathered dispersed oil. More indices including diagnostic ratios and isotopic index will be used in further studies to better trace the weathering of oils, and application of countermeasures of oil spill using fingerprinting.