Introduction

Argan oil is extracted from the fruits of Argania spinosa, a tree endemic to a specific area of ​​southwestern Morocco. During recent decades, the specific characteristics of argan oil and its beneficial effects on human health have attracted intensive, rigorous scientific, and marketing efforts. It is now widely recognized as one of the most expensive edible and cosmetic oils in international markets. Its composition has been extensively studied in numerous studies, with them reporting that it is especially rich in long-chain monounsaturated fatty acids—principally oleic (49%) and linoleic acid (36%)—and natural antioxidants, such as sterols, polyphenols, and tocopherols (Madawala et al. 2012; Sour et al. 2012, 2015; Aabd et al. 2013; Lopez et al. 2013; Kharbach et al. 2018; Maleš et al. 2018; Miklavčič et al. 2020; Gharby and Charrouf 2021; Simões et al. 2021). Its high nutritional value and biochemical properties have led to it being believed to have therapeutic value for treating and preventing several diseases (Drissi et al. 2004; Cherki et al. 2006; Charrouf and Guillaume 2008; El Monfalouti et al. 2010; Guillaume and Charrouf 2011; Sour et al. 2015). While the growing demand and soaring prices for Moroccan argan oil present economic and social opportunities, it is unfortunately also tempting for fraudsters and dishonest traders to make quick financial gains by mislabeling and/or adulterating products.

The most common form of such fraud is combining argan oil with cheaper oils, such as soya, sunflower, or hazelnut oil. Furthermore, the introduction of strict quality standards for argan oil as a product with protected designation of origin as an additional assurance of quality and authenticity has increased consumer interest in learning about the geographical origin and/or variety of argan oil that they buy. However, authenticating the varietal or geographical origin, as well as detecting adulterants, often represents a real analytical challenge. Thus, in order to monitor and control the authenticity of argan oil, several research studies have recently tried using various physicochemical determinations associated with chemometrics data processing (Rueda et al. 2014; Kharbach et al. 2017, 2018, 2019; Farres et al. 2019; Gunning et al. 2020; Mohammed et al. 2020; Elgadi et al. 2021). Among these, Gonzálvez et al. (2010) obtained better classification rates for edible oils (i.e., argan, sunflower, olive, seed, and soya oils) from their elemental content based on the type or variety using inductively coupled plasma atomic emission spectrometry (ICP-AES) combined with different chemometric approaches. What is more, new approaches have been developed to detect the adulteration of argan oil with vegetable oils through new techniques and instruments (Hilali et al. 2007; Ourracha et al. 2012; Oussama et al. 2012; Mohammed et al. 2013; Addou et al. 2016; Stokes et al. 2018; Çelik et al. 2019; Farres et al. 2019; Joshi et al. 2019). For instance, Salghi et al. (2014) used high-performance liquid chromatography coupled with evaporative light scattering to evaluate the authenticity of argan oil—as well as detect adulteration with vegetable oils such as sunflower, soya, and olive oils up to the 5% level—by studying the triacylglycerol profiles of argan oil. However, despite the high resolution and reliability of these methods, some are impractical for widespread use due to various reasons, such as the time and cost required, the need for skilled operators, the generation of hazardous chemical waste, sample preparation, preprocessing, and various technological parameters, such as the harvest conditions and the extraction, preservation, storage, and transport of the oil.

Today, the most established technique in this area is based on infrared spectroscopy, which is now widely used in food studies. It has become a particularly powerful analytical tool for studying the authenticity of edible oils and detecting any adulteration, especially when combined with techniques from chemometrics (Basri et al. 2017; Vanstone et al. 2018; Yuan et al. 2020; Bragolusi et al. 2021; Du et al. 2021; Sota-Uba et al. 2021). Economically speaking, infrared spectroscopy technology is rapid, inexpensive, and sensitive, and it can be used for high-throughput analysis of food-based components with real-time measurement at all stages of production without requiring specially skilled operators (Prieto et al. 2017).

In this paper, the infrared spectra of pure and adulterated argan oils were determined using Fourier transformed IR (FTIR) spectroscopy. New multivariate statistical analyses were then applied for the obtained chemical data, such as the principal component analysis (PCA) exploratory method for unsupervised pattern recognition and supervised classification methods like the soft independent modeling class analogy (SIMCA) and the data-driven soft independent modelling of class analogy (DD-SIMCA), in order to classify different argan oil samples according to their geographical origins and detect any adulteration with soya and sunflower oils. In addition, the degree of any adulteration was quantified using the most commonly used regression methods, namely partial least square (PLS) and principal component regression (PCR).

Materials and Methods

Sample Collection

A total of 78 argan kernel samples from five Moroccan argan forest regions (Agadir, Essaouira, Chtouka Aitbaha, Taroudant, and Tiznite) were collected by visiting producers, cooperatives, markets, and industries (Fig. 1). The provenance and quality grade of all the samples had been guaranteed by the suppliers. Argan oil was then mechanically extracted from unroasted kernels using an automatic oil press machine in the laboratory. The oils were then filtered and preserved in 250-mL dark-glass bottles in a refrigerator at + 4 °C. These had been filled with nitrogen to avoid oxidation until the day of analysis. Prior to analysis, the oils were left to stand at room temperature for at least 24 h.

Fig. 1
figure 1

Moroccan regions from the where argan kernel samples were obtained

Sample Preparation

Soya and sunflower oils were procured from the local supermarket as adulterants. Binary mixtures of argan–soya and argan–sunflower oils were prepared by combining an individual authentic argan oil with varying proportions of adulterant oil (i.e., 0%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 100% weight/weight). The resulting 24 mixtures were used as a calibration set to construct the regression models. Next, in order to evaluate the prediction performance of the regression models, 10 further mixtures of soya and sunflower oil with argan oil at 3%, 12%, 23%, 31%, 42%, 49%, 55%, 66%, 74%, and 83% proportions were prepared.

FTIR Spectra Acquisition

All the argan oil samples, both authentic and adulterated mixtures, were subjected to FTIR spectral measurements using a PerkinElmer Spectrum Two equipped with a UATR. All spectra were recorded from 4000 to 400 cm−l at a resolution of 1 cm−l. Using a Pasteur pipette, approximately 0.25 mL of each sample was placed in direct contact with single bounce-attenuated total reflectance using the Smart ARK accessory. The diamond ATR crystal was cleaned between samples using laboratory tissue and a small amount of hexane. Each FTIR spectrum was subsequently subtracted using reference spectra (air) as background. The sample spectra were measured for three replicates and displayed as an average spectrum.

Multivariate Analysis

Raw spectral files were exported from the spectrometer acquisition software to The Unscrambler (v 7.6; CAMO A/S, Oslo, Norway) software for data analysis. The original spectra data were scrutinized for unusual or outlying samples through principal component analysis. Next, a preliminary analysis of the data set for natural groupings was performed through PCA, while classification was done using single-category SIMCA (Wold and Sjöström 1977; De Maesschalck et al. 1999; Bevilacqua et al. 2014; Oliveri et al. 2021). The latter is a popular pattern-recognition technique that is based on principal component analysis, which is currently considered the most commonly used method for class modeling classification in chemometrics contexts. It involves first constructing an appropriate PCA model for each class in the training samples and calculating the residual variance of each model. The residual variances are then calculated for the unknown samples in the prediction sample set and compared to those of the developed PCA class models to determine which classes of the calibration sample set to assign. A full cross-validation and nonlinear iterative partial least square (NIPALS) algorithm was used to develop the PCA models. Samples were assigned to either a calibration or prediction set based on their positions in the main spectral file. More specifically, even-numbered samples were used for model development, while odd-numbered samples were used for model evaluation. In order to estimate similarity, samples were assigned or refused based on their reduced distance from the class space (d) in the class-model, which was calculated by taking into account the Mahalanobis distance of samples from the center of the score space and their orthogonal distance from their bilinear projection. Classification model quality was evaluated by calculating the models’ sensitivity and specificity based on true and false class assignments. Sensitivity refers to the proportion of positive cases captured, whereas specificity refers to the proportion of negatives that are correctly identified. These statistical parameters were calculated using the equations below:

$$\mathrm{sensitivity}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$$

where, TP is the true positive (the number of target samples correctly attributed as positive) and FN is the false negative (the number of target samples incorrectly attributed as negative).

$$\mathrm{specificity}=\frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{FP}}$$

where, TN is the true negative (the number of non-target samples correctly attributed as negative) and FP is the false positive (the number of non-target samples incorrectly attributed as positive).

Unless otherwise stated, all classification results are given at the 95% confidence level.

In order to extend the application range of the chemometric tools, the data were exported to a modified version of the SIMCA method, namely DD-SIMCA (Pomerantsev and Rodionova 2014), to establish a rapid identification model. In the present study, calculations were performed using standard Excel functions and a chemometric add-in for Microsoft Excel (SIMCA template.xlsb) (Pomerantsev 2014). The parameter α (0 < α < 1) was used to manage the size of the acceptance area, which corresponded to the green boundary line. A lower α means a wider acceptance area, so samples are more likely to be accepted. Meanwhile, the parameter γ (0 < γ < 1) was used to manage the size of the outlier area corresponding to the red boundary line. A lower γ means a wider area, so fewer outliers will be discovered. The samples are colored according to their status, such that regular objects (green dots) belong to the acceptance area, outliers (red squares) were outside the outlier area, and extreme samples (yellow diamonds) were located in between these. In DD-SIMCA classification, sensitivity is calculated for both calibration and validation sets, whereas the specificity is only determined from prediction set.

Quantitative spectral decomposition techniques—such as PLS and PCR, which are typically used for predictive linear modeling—were used to build models to determine the concentration of soy or sunflower oil in adulterated argan oil. No outliers were removed during calibration. The root mean square error of calibration (RMSEC), leave-one-out cross validation (RMSECV), and prediction (RMSEP) were used to assess the predictive ability of the models.

Results and Discussion

FTIR Spectral Analysis

The typical spectra for pure argan, soya, and sunflower oils are shown in Fig. 2, and it is readily apparent that no significant visible differences exist among these spectra. The specific frequency of a given vibration mode depends upon the specific arrangement of the carbonyl groups and the lipid hydrocarbon backbone, while the intensity of the specific vibration mode depends on the relative concentration of the related functional group. Argan oil and the soya/sunflower adulterated oils differ in composition principally in their linoleic and oleic acid contents. Thus, transmittance minima are clearly evident at 2922, 2853, 1744, and 1161 cm−1, together with smaller transmittance bands at 1464, 1377, 1109, 1097, and 723 cm−1. From the literature (Guillen and Cabo 1997), it can be surmised that the bands occurring in the region between 1744 and 2922 cm−1 are due to stretching vibrations of the = C-H group, while the band at 1300 and 1465 cm−1 is due to the bending vibration of the C-H bonds that are present in triglyceride molecules, and the band at 1097 cm−1 relates to the stretching vibrations of -C-O groups. The band at 723 cm−1 is connected with the bending vibration of -HC = CH- groups. Table 1 summarizes the infrared band assignments that were reported for the considered edible oils. The prominent features in the spectrum include the bands in the region of 2922 cm−1 due to the –CH- stretch of the hydrocarbon chains and those in the region of 1744 cm−1, which are due to the carbonyl groups in the triglyceride.

Fig. 2
figure 2

Plot of spectra for different pure argan, soya, and sunflower oil samples

Table 1 Functional groups and modes of vibration in the spectra with the assigned functional group, the mode of vibration, and the intensity (Guillen and Cabo 1997)

Discriminant Analysis

The similarity among the spectra reflects the similarities in the chemical compositions of the oils. Although the FTIR spectra of all the argan oil samples are similar in appearance, minor differences manifest in a more detailed analysis through small band shifts and small corresponding changes in their relative intensity (Fig. 2). The authentic argan oil samples were split into a calibration (60%) and a validation set (40%), with these being randomly selected from each region (Table 2). In order to achieve a good classification for authentic argan oil, different spectral pre-processing methods were tested in different spectral regions in combination with the PCA technique, with the best results being obtained using mean centering of the combined raw spectral data from 3044–2943, 2904–2870, 1423–1212, 1130–1001, and 913–853 cm−1. The PCA results for the calibration and validation datasets are illustrated in Fig. 3. The two-dimensional score plot of the two first PCs (PC1 and PC2), which describe 100% of the total variance, reveals that most of the authentic argan oils are clearly separated in the space, with samples generally being grouped according to their geographical origins for the five classes both in the calibration and the validation set. All validation samples are clearly plotted close to their proper classes. However, the closeness that can observed between the oils from Agadir and Essaouira—as well as among the oils from Chtouka, Taroudant, and Tiznite—can be explained by the fact that these regions neighbor each other and therefore have similar climates, as well as by the fact that a limited number of representative samples were selected in certain regions. More specifically, Agadir and Essaouira are oceanic coastal regions with a hot, humid, and arid bioclimate, while Chtouka and Tiznite are sub-Saharan regions with climatic conditions that are geographically close to arid Mediterranean. The Taroudant region, meanwhile, is surrounded by two mountain ranges (High Atlas and Anti-Atlas) and has a hot, semi-arid bioclimate.

Table 2 Geographical origins of the samples and their occurrence in the different data sets
Fig. 3
figure 3

PCA score plot of the mid-infrared spectra for authentic argan oils (PC1 vs. PC2) for calibration and validation sets

In order to obtain a classification rule for the set of n known groups of argan oil, such that new samples can be assigned, the soft independent modeling of class analogy (SIMCA) technique was applied. When constructing the model with SIMCA, the similarities between samples with defined categories were addressed, and a PCA was performed individually for each class, resulting in its own class model. As SIMCA is based on PCA, the individual PCA model for each group was created using data from the training set (Table 2). In the SIMCA model, the optimal numbers of PCs were retained for each class based on the explained variance and the predicted residual error sum of squares (PRESS). The classification results are summarized in Table 3 in terms of the sensitivity and specificity rates for the calibration and validation sets from the five regions. For the SIMCA models constructed from the Agadir and Essaouira calibration and validation sets, it can be seen that no target samples were assigned wrongly, with them showing excellent prediction performance with both a sensitivity and specificity of 100%, meaning that the samples were correctly distributed in their respective predefined groups. For the Chtouka, Taroudant, and Tiznite calibration and validation sets, the constructed SIMCA models gave very satisfactory results, with the sensitivity varying between 73 and 92% because some target samples were assigned to more than one group or assigned as non-target samples. In addition, the classification results for these groups show that some non-target samples were not correctly rejected by the model and instead wrongly assigned as target samples, resulting in specificity between 90 and 96%.

Table 3 Results of the SIMCA classification models when using the near-infrared data for the argan oil samples

Another experiment for geographic discrimination was carried out using DD-SIMCA chemometric analysis of the authentic samples from the different regions in order to build models capable of determining whether the samples from other regions can be excluded as aliens. Taking into account the optimal number of PCAs used in SIMCA for each region (Table 3), a one-class target classification model was developed using the authentic samples from a particular region and applying mean-centering preprocessing methods to the entire set of available spectra. The sizes of the acceptance and outlier areas were set at α = 0.01 and γ = 0.05 for all cases. The results of the DD-SIMCA classification models are illustrated in Fig. 4a–e, while their results in terms of sensitivity and specificity are summarized in Table 4. Since only target samples were used in the calibration and validation step (i.e., objects of the modeled class were not used for building the model), the sensitivity rate was the only parameter that could be calculated. For the calibration sets, all target samples fell correctly in line for all models, resulting in sensitivity rates of 100%. For the validation sets, the Agadir and Essaouira models demonstrated 100% sensitivity, indicating that all target samples were correctly identified. However, the Chtouka, Taroudant, and Tiznite models sometimes assigned a target sample as a non-target, resulting in 83%, 80%, and 91% sensitivity, respectively. For the prediction sets, a 100% specificity rate was evident for all classes, indicating that the samples that did not belong to the relevant region were correctly rejected by the models, as evidenced by how they are plotted outside the threshold line. Despite the overall sample set being limited in size, the relatively high sensitivity/specificity rates (90–100%) obtained for certain regions demonstrate the potential for using near-infrared spectroscopy and DD-SIMCA to classify argan oils according to their geographic origin.

Fig. 4
figure 4

Results of DD-SIMCA for authentic argan oils from a targeted geographical region vs. other regions: a Agadir, b Chtouka, c Essaouira, d Taroudante, e Tiznite

Table 4 Sensitivity/specificity achieved by the DD-SIMCA models when applied to pure argan oils from different geographical regions

Detecting and Quantifying Adulteration by Soya and Sunflower Oils in Argan Oil

Detecting Adulteration

In this section, the IR spectra of the binary mixtures were investigated as a means for detecting the adulteration of argan oil by allowing low levels of adulteration to be more easily detected through multivariate calibrations. To classify the oil IR spectra and quantify the discrepancy between pure and adulterated oil, the multivariate statistical PCA was performed after applying Savitsky–Golay smoothing, a second-order derivative and mean-centering pre-treatment procedure, at the combined frequency regions of 3044–2943, 2904–2870, 1423–1212, 1130–1001, and 913–853 cm−1. As shown in Fig. 5, 92% of the variability in the data was explained by PC1 and 6% was explained by PC2. The pure argan, soya, and sunflower oils were located in well-separated regions in the PCA distribution, with the binary mixtures occupying intermediate positions depending on the particular adulterant and the percentage adulterant content in the argan oil. Indeed, the distance from the pure argan oil increases with increasing adulterant content from 10 to 100%. From these data, it is evident that the adopted spectral parameters depend on the soya or sunflower oil present in the oil mixture, with there being a linear correlation between the proportion of adulterant oil and the PCA score. This confirms that FTIR spectroscopy can be successfully used to detect the presence of soya and sunflower oils in argan oil samples.

Fig. 5
figure 5

Plot of principal component analysis (PCA) scores for a single sample of argan oil together with adulterated mixtures with 10–100% soya or sunflower oil

Adulterant detection was studied further to demonstrate the usefulness of NIR spectral data for discriminating between different varieties of argan oil types and soya and sunflower oils by using an open-source data-driven soft independent modelling of class analogy (DD-SIMCA) chemometrics tool, with the data from the 78 authentic argan oil samples listed in Table 2 being used to define the target class. The 47 samples randomly selected for the calibration set were used to establish the model, while the remaining 31 samples were selected for the validation set and to evaluate the robustness of the model. Finally, the prediction set comprised the 42 adulterated samples. The sizes of the acceptance and outlier areas were given by α = 0.05 and γ = 0.05. Optimum classification with DD-SIMCA was obtained by considering four PCA components, with good sensitivity rates being obtained for both calibration (94%) and validation (97%). As illustrated in Fig. 6, only three samples from the calibration set were located outside the acceptance area, but they were considered as extremes (orange dots), meaning that all training objects were members of the target class as expected. Applying the classification model to the validation set resulted in all samples (blue dots) being within the acceptance area (i.e., inside the green line); thus, they were correctly classified as pure argan oil. The classification model was then applied to the entire set of adulterated samples (i.e., no target samples), ranging from 3 to 100% w/w adulteration with soya or sunflower oil. Figure 6 reveals that these samples (the red dots) are plotted out of the acceptance area, so they were identified as impure argan samples, thus implying the model has 100% specificity and could potentially be used for verifying argan oil authenticity.

Fig. 6
figure 6

The DD-SIMCA acceptance plot for the calibration, validation, and prediction samples used for the adulteration model

Quantification

Quantifying the amount of soya or sunflower oil mixed into argan oil was performed by applying multivariate calibrations, namely partial least square (PLS) and principal component regression (PCR), on the spectral data for 11 binary mixtures of each adulterant.

In the PLS and PCR calibration models, the linearity method was evaluated in order to show a proportional relationship between responses (transmittances) and analyte concentrations. The PLS and PCR models were developed using the FTIR mean-centered spectra in the frequency region of 3030–2860 cm−1, where the variations were observed and because of its ability to give the highest values of R2 and the lowest values of RMSEC, relative to other frequencies. In order to validate the developed model, cross validation using the “leave one out” technique was achieved by excluding one of the calibration samples and then constructing a model for the remaining samples. Subsequently, the model was evaluated using the previously excluded samples, while the error values for the predicted observations were computed. The new samples were then excluded from the model set and a new model was constructed. This process was repeated until all samples had been excluded once in the PLS and PCR models (Rohman and Che Man 2011). The results obtained from the PLS and PCR models in terms of the R2, RMSEC, RMSEV, and RMSEP are presented in Table 5.

Table 5 Performance of the multivariate calibration, validation, and prediction for analyzing soya and sunflower oil content in argan oil

Based on Table 5, we can say that quantifying SO and SF adulterants in AO was carried out well due to the high coefficient of correlation (R2 = 0.999) for the calibration and validation models, both with PLS and PCR. However, when compared to PCR, PLS gives a more adequate fit and offers the best calibration and validation models. For the soya PLS model, a low error of calibration was obtained for the root mean square error in calibration (RMSEC = 0.389% w/w), as well as a low validation error as expressed by the root mean square error of validation (RMSEV = 0.639% w/w) when using four factors. Similarly, the presence of SF in AO was quantified using the PLS calibration model, which offers the best results in terms of the high R2 values for the calibration (0.9994) and validation (0.9993) data and the lowest errors both for RMSEC (0.685% w/w) and RMSEV (0.863% w/w) when using only two factors. Graphical representations of the regressions produced using these models are shown in Fig. 7a and b. The PLS calibration model was also used to calculate the adulterant content of samples in the prediction set. Plots of the relationship between the actual and FTIR-predicted values of adulterant content in the prediction samples are shown in Fig. 7c and d, with the R2 and root mean square error of prediction being 0.998 and 1.067% (w/w), respectively, for soya and 0.997 and 1.199% (w/w) for sunflower.

Fig. 7
figure 7

The PLS calibration and prediction models for the relationship between the actual and FTIR-predicted values for AO adulteration with SO (a and c) and SF (b and d)

In the PLS calibration model, the verification and justification of the optimum principal component and spectra regions retained for each class and used for constructing the models were carried out by calculating the difference between the real and predicted values. The sum of the squares of the discrepancies is called the predicted residual error of sum of squares (PRESS) value, and this was calculated for different principal component (PC) values (Rohman and Che Man 2011). Figure 8 shows that a stable RMSECV value was minimally obtained after four and two factors for adulteration by soya and sunflower oil, respectively. Based on these results, we can suggest that the proposed models might be of practical use for detecting and quantifying soya and sunflower oil adulteration in argan oil at levels as low as 3%.

Fig. 8
figure 8

Predicted residual error sum of squares and explained variance versus number of PCs in the SO a and b and SF c and d model

Conclusions

Using a valuable database of 78 authentic argan oils, this study has demonstrated that FTIR spectroscopy combined with multivariate analysis—including principal component analysis (PCA), soft independent modeling of class analogy (SIMCA), and data-driven soft independent modelling of class analogy (DD-SIMCA)—represents a potentially reliable tool for classifying argan oil samples according to their geographical origin. In addition, it also appears to be a good candidate for rapidly detecting and identifying soya and sunflower oil adulterants in argan oil products for quality control purposes. Moreover, once adulteration has been detected, the degree of adulteration with soya and sunflower oil can be quantified through partial least squares regression, which gives a good model for calibration, validation, and prediction within acceptable errors. The developed PLS and PCR calibration models could be successfully used to accurately determine a low concentration of soya or sunflower oil in argan oil. Being able to distinguish between argan oils and other cheaper oils will help reinforce the quality and commercial value of this plant product.