Introduction

The use of colorants for making foods esthetically attractive has been known for centuries. Synthetic colorants are added to foods to replace natural color lost during processing, to reduce batch-to-batch variation, and to produce products with consumer appeal where no natural color exists. Therefore, food coloration is important to the food industry (Tripathi et al. 2007). The problem of food adulteration is by no means a contemporary phenomenon and is likely to be as old as the food processing and production systems themselves (Ellis et al. 2012). In the recent past, food fraud has become more sophisticated due to the use of synthetic adulterants which has resulted in growing concern about associated health risks (Krska et al. 2012).

Spices are globally traded commodities that are used in many food industries and can be found in numerous dishes or as an ingredient in other foods such as sauces. Spices have been fraudulently manipulated with the intentional addition of Sudan dyes to improve their natural appearance (ASTA 2005). Sudan dyes are a family of azo-dyes typically used in waxes, printing inks, shoes, and plastics, among other industrial uses. However, Sudan dyes are carcinogens that are metabolized to possible carcinogenic colorless amines that can form DNA adducts inducing mutations (European Food Safety Authority 2005; Fonovich 2013). In light of the dangerousness, Sudan dyes represent for public health, and their use as a food additive for human consumption is banned worldwide.

Culinary spices are ingredients of numerous foods like ketchup, sauces (tomato, barbecue, meat, sausage, Tabasco, pepper, chimichurri), among others. Most spice samples are adulterated with Sudan dyes at high levels (up to 10,000 mg kg−1) (Mishra et al. 2007) and lower levels are found in sauces (up to 250 mg kg−1) (https://webgate.ec.europa.eu/rasff-window/portal/). There is an increasing demand for the development of fast, easy, and low-cost analytical methods to test for food adulteration. In such scenario, screening methods would be useful for implementing rapid, simple, and uniform controls to detect adulterated foods with Sudan dyes according to the actual exigencies from the European Union (European Commission Decision 2009).

Several analytical methods have been developed for the determination of Sudan dyes in foods, mostly based on liquid chromatography with different detection systems (Rebane et al. 2010) and novel sample cleanup (Yan et al. 2012; Zhang et al. 2012). Also, immunoassays have been proposed (Liu et al. 2012; Oplatowska et al. 2011) and very recently, electrochemical methods (Wu et al. 2013; Yu et al. 2014; Chao and Ma 2012). Although very low detection limits can be reached, most of these methods suffer some drawbacks like they are time-consuming, expensive, require sample cleanup, and they need a strict control of experimental conditions and stability. Moreover, other analytical methodologies based on spectrometric methods and multivariate analysis have also been proposed: UV-visible (Di Anibal et al. 2009; Di Anibal et al. 2014; Yuan et al. 2008), 1H-NMR (Di Anibal et al. 2011), surface-enhanced Raman scattering (SERS) (Di Anibal et al. 2012, López et al. 2013), fluorescence (Di Anibal et al. 2015), and NIR (Haughey et al. 2015). Spectrometric methods have the advantage of providing a rapid analytical response, but they must be combined with proper multivariate analysis to obtain useful information when working with complex food matrices. In the food field, multivariate screening methodologies have increased in importance ever since it became important to detect fraudulent samples as well as ensure quality and safety (Oliveri and Downey 2012; Marini 2013). Multivariate classification techniques can be divided into two main groups: discriminant or class-modeling techniques. The first group is based on their discriminating power, and some examples are linear discriminant analysis and partial least squares-discriminant analysis (PLS-DA). They require at least two classes to be defined. On the other hand, the second group is based on the class-modeling ability, such as unequal dispersed classes (UNEQ), soft independent modeling of class analogy (SIMCA), or the recently proposed partial least squares density modeling (PLS-DM) developed by Oliveri et al. (2014). An advantage of class-modeling techniques is that they can be used in classification problems where there is only a class of interest, unlike the discriminant approach (López et al. 2014).

UV-visible spectroscopy with multivariate analysis has been successfully employed to determine the adulteration of culinary spices with Sudan dyes, as aforementioned. However, when working with more complex matrices such as sauces, other sample pretreatments are needed, for example, liquid–liquid extraction (LLE) (Long et al. 2011), solid phase extraction (SPE) (Qi et al. 2011), or molecularly imprinted polymer MIP-SPE (Hu et al. 2012). These pretreatments are laborious, time-consuming, and usually involve large amount of solvents. The aim of this work is to evaluate the potential of UV-visible spectroscopy with multivariate classification as a rapid, simple, and affordable screening tool for determining the adulteration of ketchups and barbecue sauces with a simple and quick sample extraction process with an organic solvent. Two classification strategies were evaluated: one based on the implementation of a class-modeling technique (PLS-DM) when the focus is to test whether a sample is adulterated or not by modeling a single class represented by unadulterated samples (untargeted strategy) and other strategy based on a discriminant classification technique (PLS-DA) when the focus is to determine food adulteration with Sudan I dye (targeted strategy).

Materials and Methods

Reagents and Samples

Sudan I dye was purchased from Sigma-Aldrich (Bs. As. Argentina). A total of 19 sauces distributed between ketchups and barbecue sauces from different commercial trades were purchased from markets. Acetonitrile (Aberkon Chemistry), acetone (Anedra), isopropyl alcohol (Anedra), and NN-dimetylacetamide (Merck) were evaluated for the extraction. Two adulterant concentrations were studied (250 and 100 mg kg−1). A working solution of Sudan I dye diluted in NN-dimethylacetamide was prepared to spike the commercial sauces. Samples followed a simple extraction process: 10 mL of NN-dimethylacetamide was added to 10 g of each sample and then was shaken in an automatic shaker (Shaker Pro Viking) during 15 min at 150 rpm. The resulting extracts were filtered through glass microfiber filters. In order to obtain the UV-visible spectra, a volume of this extract was spiked with an appropriate amount of Sudan I dye in such a way that the final concentration was 2.5 and 1 ppm (mg/L). A list containing the commercial original and spiked samples with the numbers assigned to all them is shown in Table 1. The entire dataset is formed by 57 samples.

Table 1 Description of the samples studied: commercial name and assigned number

UV-Visible and Software

A UV-visible spectrophotometer (Agilent 8453, USA) equipped with a diode array detector was used to acquire spectra. The spectrum scanning was carried out in the wavelength range of 250–600 nm which represents 351 variables.

Multivariate analysis was performed under Matlab software (Version 7.0, The Math Works Inc., Natick, USA), and PLS-DM routine was provided by Oliveri et al. (2014).

Data Analysis

Principal Component Analysis

Principal component analysis (PCA) is an unsupervised analysis that was applied as exploratory analysis for detecting trends and patterns in the spectral data. PCA projects high-dimensional data onto lower-dimensional space, so all redundant information is summarized which simplifies the graphical interpretation of the data.

Multivariate Classification

Partial least squares-discriminant analysis (PLS-DA) uses a linear multivariate model to relate independent variables (spectra) to depend variables that designate class of the sample by means of a binary code, in which 1 indicates that the sample belongs to the class of interest and 0 indicates that it belongs to a different class. More details of this technique are provided elsewhere (Brereton 2009). Partial least squares-density modeling (PLS-DM) is a partial least squares PLS-based class-modeling approach that combines the features of PLS and potential function methods (PFM) with Q statistics by using a distance-based sample density measurement as the response variable. Initially, a PLS model is developed using analytical data as X predictor matrix and a density vector as the y response. The response value for each sample of the training set is computed as an estimation of sample density based on inter-sample Euclidean distances in the multivariate space. These distances are ordered, and the density values obtained as the sum of the k smallest (lowest order) distances are obtained. Therefore the k parameter influences the smoothness of density function, which evolves from a sharper to a smoother shape while increasing k. Moreover, after the PLS modeling, the PLS scores of the first selected L latent variables are used as input to estimate the probability density distribution (potential function methods, PFM) of the class with different smoothing coefficients (a). This coefficient cooperates in determining the shape of the distribution, being higher the smoothness when increasing a (with usual values between 0.1 and 1.5). This algorithm is well detailed by Oliveri et al. (2014).

As only nineteen sauces from different commercial trade were found in commercial markets, it is difficult to divide the dataset into training and test set. Therefore, a cross-validation approach was used to validate all classification models (Venetian-blind, three cancellation groups), where class 1 is formed by original samples (unadulterated samples) and class 2 by samples adulterated with Sudan I dye at the two studied concentration levels (250 and 100 mg kg−1).

Results and Discussion

Ketchup and barbecue sauces are complex matrices which needed an adequate extraction process in order to break up the dressing emulsion. Four different solvents were evaluated (acetonitrile (Aberkon Chemistry), acetone (Anedra), isopropyl alcohol (Anedra), and NN-dimetylacetamide) with mechanic agitation. Ketchups and barbecue sauces contain a high percentage of water and plant material (tomato cell walls), and NN-dimethylacetamide was the best option for the extraction because its emulsion-breaking action is the most satisfactory. Taking into account the proportion of spices as ingredients in sauces, two adulterant concentrations were studied (250 and 100 mg kg−1) which are within the usual range that could be found in such food matrices (Oplatowska et al. 2011).

Spectra Characterization

The spectra of original and adulterated samples from ketchup and barbecue sauce (randomly chosen) together with pure Sudan I dye spectrum are shown in Fig. 1. It can be observed that both original samples have similar spectral shape with each other, and their adulterated spectra differ in absorbance intensity.

Fig. 1
figure 1

UV-visible spectra of random original samples: barbecue sauce (dotted line) and ketchup (dashed line) with their adulterated samples (dashed-dotted and solid lines, respectively). Pure Sudan I dye spectrum is also shown (2.5 mg L−1)

Multivariate Analysis

In order to discard a non-informative spectral zone by visual inspection, the wavelength range between 300 and 550 nm was considered before the implementation of the multivariate analysis. A mean-centering preprocessing was applied to the UV-visible spectra. The figures of merit for evaluating the classification results were sensitivity, defined as the ability of the model to recognize its own samples and specificity defined as the ability to distinguish external samples. In addition, the false negative rate (1—sensitivity) and false positive rate (1—specificity) were evaluated.

Principal Component Analysis

Principal component analysis (PCA) was applied before the classification technique in order to evaluate the distribution of the samples in the multivariate space. Figure 2 shows the PCA scores plot for the first three principal components (PCs) which represent a 98.85 % of the total information (PC1 = 76.42 %; PC2 = 19.95 %; PC3 = 2.48 %). It can be observed in Fig. 2a (PC1 vs PC2 scores plot) that although PC1 accounts for the maximum information, it does not allow a clear distinction between original and adulterated samples. Otherwise, considering PC1 vs PC3 scores plot (Fig. 2b), a better discrimination can be obtained. In this case, most original samples have PC1 negative scores values while most adulterated have PC1 positive scores values and there are few adulterated samples that overlap the original samples which belong to the lowest Sudan I concentration level.

Fig. 2
figure 2

PCA scores plot of original samples (triangles) and samples adulterated with Sudan I dye (asterisks) at the two concentration levels (100 and 250 ppm). a PC1 vs PC2 scores plot and b PC1 vs PC3 scores plot

Classification Techniques: Untargeted Modeling with PLS-DM

In PLS-DM, class 1 is modeled and adulterated samples are used to test the ability of the model to reject them. As this classification technique is focused on a single class, it is important to detect if there is any sample with extreme behavior that could worsen the classification performance (outliers). Outliers can be detected by inspection of Q residuals and Hotelling T 2 values obtained from the PCA model (Brereton 2009). Q values represent a measure of variance which is not captured by the model (fit of samples outside the model) while T 2 values reflect the variance captured by the model (fit of samples within the model). Figure 3 shows Q residuals vs Hotelling T 2 plot. Although all samples fall inside both limits defined by Q and T 2 values, just one sample out of the 19 falls out of the Q limit (has a large Q value). Therefore, this sample (no. 13) was eliminated from the dataset.

Fig. 3
figure 3

Q residuals and T2 Hotelling statistics from PCA model for original samples

In order to define the optimal PLS-DM model, two parameters involved in the PLS-DM algorithm were evaluated: the k parameter and the smoothing coefficient (a), with the dataset formed by original and adulterated samples at 250 ppm. The k parameter was varied between 1 and 7 values while the smoothing coefficient (a) was varied from 0.3 to 1.1 with 0.1 increments. Initially, the optimal number of LVs was selected for each condition (varying from 1 to 15) considering the maximum sensitivity and specificity of the resulting class model evaluated by cross-validation. From the overall results, two LVs were selected (data not shown). Then, the models at fixed LVs were evaluated taking into account all the possible combinations of the k parameter and the smoothing coefficient (a) by a cross-validation approach (Venetian-blind scheme, three deletion groups) on mean-centering data with a 95 % confidence level. Table 2 shows the results for the model optimization with different k and a values and two LVs in all cases. The quality assessment of the results was performed by evaluating sensitivity (samples of the class of interest correctly assigned to their class) and specificity (samples not belonging to the class of interest correctly not assigned to that class). It can be observed that the maximum specificity is reached for any smoothing coefficient, while the best sensitivity percentage (94.7 %) is for k = 3 and a = 0.9 (shown in bold).

Table 2 PLS-DM parameter optimization

After PLS-DM optimization, a classification model considering the 56 samples was built (cross-validation Venetian-blind, seven deletion groups, two LVs, 95 % confidence level). Table 3 shows the classification results, obtaining a sensitivity and specificity of 83.3 and 97.0 %, respectively. False negative (FN) and false positive (FP) rates are related to sensitivity and specificity, respectively. Considering the following null hypothesis H 0: “The sample does not contain any adulterant,” the implication of false positive results is so important when samples that were not modeled and could be falsified are classified within the original class. These results might involve a risk on consumers’ health depending whether harmful substances were used on the food fraud implicated. Otherwise, false negative results (original samples not assigned to their own class) might have economic implications for producers. Although satisfactory classification results were reached (see Table 3), the false negative rate is a bit high. Therefore, a simple variable selection approach based on PCA was applied in order to improve the results. The information contained in the first three principal components (PCs) was used (percentage of variance higher than 1 %) which accounts a 98.85 % of the total variance. So with the selected variables corresponding to the sample scores of the three first PCs, a new classification model was built (mean-centering preprocessing, cross-validation Venetian-blind, seven cancellation groups, two LVs, 95 % confidence level). Figure 4 shows the results of the optimal PCA-PLS-DM class model. The potential function methods probability (PFM) density, which defines the class boundary by setting a critical value of the distribution at a confidence level, is plotted against Q statistics that contains the fraction of the information not explained in the final PLS model. In this way, the compliance of each sample with the class model is achieved when both criteria are satisfied (samples falling within the limits defined by these two parameters). The obtained classification results are depicted in Table 3. The model with a PCA-reduction variable approach provides a better sensibility (94.5 %), similar specificity (97.4 %), and the false positive/negative rates decreased (below 5.5 %) for cross-validation samples. Therefore, with a simple reduction variable process, the classification errors are minimized, which is very positive considering the potential risk that could be implied when obtaining such results.

Table 3 Classification results obtained in cross-validation with both untargeted and targeted strategies
Fig. 4
figure 4

Sample compliance with the PLS-DM model according to the probability density and the residual Q statistics. Original samples (green squares) and adulterated samples (red circles)

Taking into account that class 1 is represented by all the available samples in local markets, the overall obtained results are positive and might be improved if more samples are added. Moreover, the model was able to distinguish between original and adulterated samples at a quite low concentration level (100 ppm) considering the Sudan dye levels usually found in foods. It should be pointed that the classification error belongs to the lowest concentration level in the PCA-PLS-DM model.

Classification Techniques: Targeted Classification with PLS-DA

PLS-DA builds a discriminant classification rule considering class 1 (original samples) and class 2 (adulterated samples with Sudan I dye at the two concentration levels). The same cross-validation approach was used like the former case, and the model was built with four LVs selected by the minimum root mean square error of cross-validation (RMSECV). Table 3 shows the classification results considering the discriminating classifier. Excellent results were obtained, as original samples were mainly classified in their own class (with the exception of one sample) and all the adulterated samples were correctly classified in class 2. Comparing class 1, the false negative rate is the same as PLS-DM whereas the highest specificity was achieved resulting in a 0 % false positive rate.

Conclusions

A multivariate screening methodology based on UV-visible and multivariate classification was proposed for testing adulteration in sauces. A simple, rapid, and affordable sample extraction process was used for complex matrices such as ketchups and barbecue sauces. Two different classification strategies were evaluated: untargeted modeling (class modeling) and targeted classification (discriminate analysis). Both strategies gave satisfactory results. When the intention is to find out whether a sample is adulterated or not, untargeted modeling is a good choice. The advantage of such classification model is that it can be built only with original samples (unadulterated), so any sample can be predicted even when there is no knowledge about the presence of unwanted substances. In consequence, the implementation of this strategy would save time and money. Samples that do not fit the modeled unadulterated class could be altered or falsified with dangerous or cheaper substances; therefore, a confirmatory analysis is required to determine the actual state of the samples. Otherwise, targeted classification is a suitable option when testing adulterated samples with a common and known adulterant like Sudan I dye. This strategy provides detailed information about the adulteration problem and is very useful in situations where, based on previous experience, an illegal substance is suspected to be found. However, with this strategy, it is necessary to analyze samples containing the adulterant used to make the fraud.

The methodology proposed in this work could be implemented as a simple, rapid, and economic screening tool for monitoring the quality of popular sauces used worldwide, especially in situations that require quick responses.