1 Introduction

Edible oils are the fats extracted from plant seeds, and they are the most important dietary components filled with nutritional values, and their quality is a major concern. Sunflower oil consists of unsaturated fats in the form of polyunsaturated fatty acids and mono-unsaturated fatty acids, which makes this oil good for health. In India, the sunflower crop became the fastest-growing intercrop. During festive seasons, heavy demand for this oil tempts some defrauders to adulterate expensive sunflower oil with cheap oils like palm oil to get more profit. Palm oil contains more saturated fatty acids, which are not recommended for cooking. Several researchers claimed the adverse health effects of oxidized palm oil, including diseases related to the kidney, lung, liver, and heart [1, 2]. Hence, detection and quantification of adulteration are very important to prevent adverse health effects.

The standard methods for detecting adulterants in edible oils and fats have been covered under IS:548 (Part-II)-1976. These are chemical-based methods, and the detection is based on the development of a characteristic color, turbidity, or precipitation formation. Instrumentation methods like gas chromatography and high-performance liquid chromatography were used for detection and quantifying the adulteration in edible oils. These are based on the determination of fatty acids and thereby detection of adulteration in edible oils [3,4,5].

Fast gas chromatography combined with mass spectrometry (GC/MS) followed by chemometric tools has been used for detecting adulterations in edible oils [6, 7]. Biomimetic-based multi-sensory methods such as electronic tongue and electronic nose combined with chemometrics were also reported in the literature. The electronic tongue [8], which mimics the human perception of taste artificially, was successful in classifying and detecting the adulteration in edible oils [9,10,11]. The metal oxide sensor-based electronic nose was an alternative method for fast detection of adulteration in sesame and virgin olive oils [12,13,14].

Near-infrared spectroscopy coupled with pattern recognition algorithms [15,16,17,18], Mid-infrared spectroscopy [19,20,21,22], Magnetic resonance spectroscopy [23], Synchronous front-face fluorescence spectroscopy [24], Raman spectroscopy, and FTIR spectroscopy [25,26,27] combined with chemometric methods were successful in the detection of various edible oils adulteration. Though they can detect adulteration in various oils, these methods are time-consuming, requires a high-end instrument facility, tedious, and requires expert supervision. Hence a portable, non-destructive, robust, rapid, inexpensive analytical method that does not require any sample preparation is essential for detecting adulteration in edible oils. Mid-infrared spectroscopy based on Attenuated total reflection (ATR) sampling provides a viable solution for this purpose.

This paper proposes a method for fast and accurate detection of palm oil adulteration in sunflower oil using handheld attenuated total reflection (ATR) spectrometer in midinfrared spectroscopy is used for this purpose. Mid-infrared spectroscopy based on ATR sampling is a nondestructive method with no sample preparation. It is becoming a robust alternative tool to near-infrared spectroscopy for the analysis of edible oils and fats. The method utilizes the internal reflection principle achieved by an ATR crystal and variable filter array. ATR method has not been previously used for detecting adulteration in sunflower oil with palm oil.

The remaining part of the paper is divided into three sections. Section 2 discusses the materials and methods for explaining sample collection, methodology for determining the authenticity of collected samples, and lab-made adulterated sample preparation, ATR spectra acquisition, chemometric analysis methods for the detection of adulterations in edible oils. In Sect. 3, the results and discussion of the quantitative and qualitative analysis of edible oil adulteration are presented, followed by a conclusion in Sect. 4.

2 Materials and Methods

2.1 Collection of Edible Oil Samples

Sunflower and Palm oil samples were collected from authorized sellers and details of these edible oil samples are given in Table 1. To avoid oxidation and rancidity, collected samples were stored in airtight containers in a dark place.

Table 1 Edible oil samples and their brands that are used for experimentation

2.2 Methodology for Checking Authenticity of Samples

The authenticity of collected edible oil samples was tested in the laboratory using IS-548 part I standards and American oils and chemist society (AOCS) standard methods [28]. The Physicochemical parameters of edible oils were measured using standard AOCS methods and these parameters were validated with the specifications (allowed ranges for authenticated oils) provided by Food safety and standard authority of India (FSSAI). These physicochemical parameters include fatty acid composition (AOCS-CD-14C-94;2011), peroxide value (PV) (AOCS-CD-8B-90;2011), acid value (AV)(AOCS-CD-3D-63;2011), unsaponifiable matter (USM)(AOCS-CA-6A-40;2011), saponification value (SV)(AOCS-CD-3-25;2011), refractive index (RI) and iodine value (IV)(AOCS-CD-1-25;2011). The physicochemical characterization of the edible oil was carried out with the assistance of an FSSAI-certified laboratory, the Council of Scientific and Industrial Research-Indian Institute of Chemical Technology (CSIR-IICT), in Hyderabad. Table 2 shows the measured parameters, while Table 3 shows the FSSAI specifications for these oils, and Table 4 shows the fatty acid values. Figure 1 depicts a graphical representation of FFA values. The measured properties were within the FSSAI specified ranges, confirming the quality of edible oils and the absence of adulteration. These oils are then used to make adulterated samples in the lab.

Table 2 Physicochemical properties of edible oils
Table 3 FSSAI specifications of edible oils taken for experimentation
Table 4 Fatty acid composition of Sunflower and Palm oils
Fig. 1
figure 1

Visualization of fatty acid profiles of palm and sunflower oil

2.3 Lab-Made Adulterated Sample Preparation

Adulterated samples were prepared in the lab by accurately and appropriately mixing palm oil with sunflower oil in proportions of 5%, 10%, 25%, 50%, and 75%. The 5% sample contains 5 ml of palm oil and 95 ml of sunflower oil. Similarly, 10% sample contains 10 ml of palm oil in 90 ml. of sunflower oil, and so on. These adulterated oil samples were thoroughly mixed with a magnetic stirrer to ensure uniform mixing and stored in an airtight container until experimentation.

2.4 Spectra Acquisition

The IRSphinx ATR portable spectrometer from Aspectus Technologies, Germany (formerly Pyreos-750 from PYREOS Edinburgh) was used in this research. This spectrometer works on the total internal reflection principle. Spectrometer was equipped with a Zinc Selenide (ZnSe) ATR crystal, an electrically modulated Micro-Electro-Mechanical System (MEMS) emitter as an illuminating source with a spectral range of 5.5–11.0 µm (1800–900 cm−1), a linear variable filter (LVF) as a dispersing medium, and a 128-pixel uncooled pyroelectric array as a detector. This spectrometer was interfaced with a computer via USB to Ethernet converter. The ATR crystal was horizontally arranged in such a way that the source beam undergo nine reflections within the crystal before reaching the detector. This spectrometer could analyze both liquid and solid samples. Triglycerides are the primary constituents of edible oils, and they can be directly placed on the ATR Crystal for analysis. An auto pipette of 2 ml volume capacity was used to place edible oil samples on the ATR crystal.

Five sample readings were taken consecutively, and the average of these five readings was taken as a single sample spectrum. Fifteen measurements were taken for each sample in the wavelength range 5.5–11 µm (1800–900 cm−1) using the data acquisition software Sphinx Suite 1.5.0.1. Prior to the measurements with edible oil samples, a background spectrum with blank ATR crystals was recorded. The background spectrum was used to correct the absorbance spectra. After completing an acquisition, the ATR crystal was carefully cleaned with acetone solution using a lint-free cloth and thoroughly dried before taking the next sample spectrum. A total of 20 measurements for each sample were collected. Total acquired data with lab-made adulterated samples were of the size 140 × 128. The block diagram of the experimental procedure of Mid-infrared spectroscopy with ATR sampling is shown in Fig. 2.

Fig. 2
figure 2

Block diagram of mid-infrared spectroscopy with ATR sampling

2.5 Chemometric Analysis Methods

The chemometric analysis was carried out by using Unscrambler (version 10.1 CAMO AS, Trondheim, Norway) and MATLAB (7.1 Mathworks. inc) software packages. Acquired ATR spectral data were divided into two groups training and validation (testing). Leave one out cross-validation (LOOCV) method opted to create a testing set. In n-fold LOOCV, n is the number of folds equals the number of instances in the data set. The training algorithm will be implemented once for each instance, with all other instances as a training set and the selected instance as a testing set. LOOCV is a procedure for estimating the performance of machine learning algorithms when they are used to make predictions on data that was not used to train the model. Training set of size 105 samples and a testing set of 35 samples is used in this work. All samples in training and testing sets were represented with proper notations to avoid confusion of calibration samples in validation sets. As all the spectral variables never contribute maximum variance for classification, it was necessary to exclude the spectral ranges that are dominated by artifacts or noise. As a basic rule, the raw spectra were plotted, and peaks were examined to identify and select suitable regions. Figures 3 and 4 Show the ATR spectrum for palm oil and sunflower oil. Savitkzy Golay smoothing method with 7 points window and first-degree polynomial was employed to improve the signal-to-noise ratio without distorting the signal.

Fig. 3
figure 3

ATR sampling spectra acquired for palm oil

Fig. 4
figure 4

ATR sampling spectra acquired for sunflower oil

2.5.1 Principal Component Analysis

The principal component analysis (PCA) is the most widely used unsupervised method for reducing the dimensionality of a dataset while retaining as much statistical information (variability) as possible [29]. PCA is an orthogonal transformation technique in which the original data is transformed into another feature space such that the maximum variance among the data lies in the first coordinate, referred to as the first principal component, and the second greatest variance lies on the second coordinate, and so on. The coordinates in the transformed space are linear combinations of the original feature vectors with variance preserved. The input data can be expressed as an \(X_{{{\text{nxp}}}}\) matrix, with n rows representing number samples and p columns representing variables (absorbance). Mathematically PCA decomposes X of r rank into a sum of r matrices with rank r,

$$X = TP^{T } + E$$
(1)

where T is the score matrix, P is the loading vector and E is the residuals.

Score matrix contains the latent variables (LV) which are used to visualize the similarities in the data in a fewer dimensional space using a score plot between two or three LV. The P loading matrix describes how each LV is related to the original variables (n) and can be used to determine which LVs are critical to describe the samples. In the present work PCA based on the Singular Value Decomposition (SVD) algorithm was applied to the mean-centered spectra, to analyze the sample clustering and identify the outliers. The spectral ranges that contribute 90% or more of the variance were chosen from the PCA correlation loadings plot (shown in Fig. 6). The absorbance values corresponding to these chosen spectral regions were moved into a new data set and represented, and R3.

2.5.2 Partial Least Square Discriminant Analysis (PLS-DA)

Partial least squares discriminant analysis (PLS-DA) is a type of supervised discriminant classification algorithm. It is a variant of partial least square regression (PLSR), with the output variable as categorical. PLS-DA solves this problem by decomposing the input data matrix \({\text{X}}_{{{\text{nxp}}}}\) into two matrices, an orthogonal scores matrix \(T_{{n{ } \times { }p}} { }\) and loadings matrix \(P{ }_{{j{ } \times { }p}} .{ }\) The response vector Y is also decomposed into an orthogonal score matrix \({ }T_{{n{ } \times { }p}} { }\) and loadings matrix \(Q_{1 \times p}\). There are two fundamental equations in the PLS-DA model

$$X = TP^{T} + E$$
(2)
$$Y = TQ^{T} + F$$
(3)

Matrix T can be derived from X using a weight matrix \(W_{jxp}\)

$$T = XW\left( {P^{T} W} \right)^{ - 1}$$
(4)

The response vector Y will become

$$Y = XW\left( {P^{T} W} \right)^{ - 1} Q^{T} + F = XB + F$$
(5)
$${\text{The regression coefficient vector }}B{\text{ is given by}}\;B = XW\left( {P^{T} W} \right)^{ - 1}$$
(6)

In the present work, PLS-DA was performed on mean cantered data using the Classification Toolbox ver. 3.1 with a Venetian blind cross-validation on 10 groups [30].

2.5.3 Soft Independent Modeling of Class Analogy (SIMCA)

SIMCA is a supervised class modeling algorithm based on PCA method. For each sample class (known classes in the data), a PCA is performed. From each PCA sub-model specific number of components (LV) are selected based on the cross-validation. These selected LVs are used as input for the SIMCA model. The number of selected components may not be same for all PCA sub-models. The distances (Euclidian) between each class and each element are calculated and the boundaries of each class are determined with of confidence level of 95%. SIMCA's calibration and cross-validation boundaries are then used to predict the potential class of new samples. In the present work, SIMCA models for the classification of adulterated and pure edible oils were developed in class modeling mode, and Venetian blind cross-validation method with 10 groups using Classification Toolbox ver. 3.1 [30].

2.5.4 Partial Least Square Regression

PLS regression is a generalization and combination of features from PCA and MLR algorithms. This idea was first introduced in the social sciences by Herman Wold et al. in 1966 [31]. In PLSR the response matrix Y is a numerical in nature. Like PLS-DA PLSR decomposed both Input X and response Y into score and loading matrices. The Unscrambler software was used for quantifying adulteration of palm oil in sunflower oil. The model was examined based on the root mean square error of calibration (RMSEC), standard root means square error of validation (RMSEV), coefficient of determination (R2).

3 Results and Discussion

3.1 Spectral Analysis

The ATR spectra of pure and adulterated samples in the region of 1781 to 915 cm−1 are shown in Fig. 5. It is observed that the peak spectra of pure palm oil and pure sunflower oil are non-overlapping, while the spectra of the adulterated sample fall in between the two in the entire range of the region of spectra. The spectra of a high percentage of adulterated samples tend to be near the palm oil spectra, while the low percentage of adulterated sample spectra was towards the sunflower sample spectra. The difference can be seen more distinctly in the spectral region of 1781 to 1635 cm−1 (Region 1).

Fig. 5
figure 5

Mid-infrared spectroscopy with ATR sampling -edible oil spectra

The absorption spectrum observed at different spectral regions, Region 1, and Region 2, is associated with a specific functional group. From the literature, the functional group associated with –C=O (ester) carbonyl group from ester linkage of triacylglycerol is attributed to the wavenumber of 1745 cm−1. The other functional groups/bonds in fatty acids, for example –C–H–(CH2, CH3), = C–H–(cis), –C–O–CH2–, –C–H–, and –C–H-bending have been reported to be attributed to wavenumbers 1460, 1377, 1161, 1061, 1117 and 1097 cm−1, respectively [32]. In the present work, the spectrum observed in the regions of 1786–1680 cm−1 could be because of the ester group of the triglycerides present in the edible oils, while the 1490–915 cm−1 could be because of fatty acid functional groups. The wavelength range 1400 to 1097 cm−1 is represented as fingerprint regions. Figure 6 shows the correlation loading plot of PCA applied on entire spectra. This plot is used to select variables contributing to the maximum variance. Spectral variables at 1717–1581, 1501–1447, 1372–1334, 1263–937 cm−1 are selected (Region3) for analysis.

Fig. 6
figure 6

Principal component analysis correlation loading plot for selection of variables

3.2 Principal Component Analysis results

Principal Component Analysis (PCA) explains the variance in acquired data and observes the sample’s clustering. PCA of the spectral data is carried out at the selected spectral range of 1781 to 1635 cm1, and the clustering of adulterated and pure samples is shown in Fig. 7. It can be observed from the score plot that palm and sunflower oils showed distinctive clusters. The explained variances by the first three principal components, PC1, PC2, and PC3, were 93%, 3%, and 0.6%, respectively, indicating the feasibility of detecting palm oil in sunflower oil significance level of 5%. Similar results were obtained when PCA is applied for the spectral regions of 1492–937 cm1, where the PC1, PC2, and PC3 principal components contributed 81%, 16%, and 1% variance, respectively, as shown in Fig. 8.

Fig. 7
figure 7

PCA score plot (1781–1635 cm−1)

Fig. 8
figure 8

PCA score plot (1492–937 cm−1)

Figure 9 shows the PCA score plot applied to selected spectral region3. The explained variances by PC1, PC2, and PC3 were 75.1%, 17.72%, and 2.3%, respectively, indicating the present experimentation ATR spectroscopy combined with chemometrics can be used for detection of adulteration in sunflower oil. As the percentage of adulteration increased, the cluster tends to move away from sunflower oil and approach palm oil samples. Even fewer adulterated samples showed a different cluster than the pure sunflower oil cluster. This could be explained due to the difference in fatty acid composition of sunflower and palm oil.

Fig. 9
figure 9

PCA score plot for selected variables from PCA loading plot

3.3 Classification of Adulterated Edible Oil Samples

SIMCA and PLSDA were used for modeling the classifier to assign class membership among the sunflower, adulterated samples, and palm oil. PLSA-DA was performed using 4 components (LVs), accounting for 98% of the total data variance. Using PLS-DA adulterated samples are clearly classified as adulterated and never classified into a pure class. In adulterated samples, 25% adulterated sample mixed with 50% sample in R1. For wavelength region selected from PCA loading plot showed good classification accuracy (90%) compared to the three selected ranges (R1(84.3%) and R2(87%)). Table 5 shows the classification results in terms of a confusion matrix for Region1, Region2, and Region3.

Table 5 PLS-DA model classification results for sunflower oil adulterated with palm oil

As a result of SIMCA classifier with 4 components from each class PCA model, it was observed sunflower oil and palm oil were classified 100% accurately under their class at all selected spectral ranges. None of the adulterated samples were misclassified and never classified the adulterated samples into pure sunflower samples shown in Table 6 indicating the success of the present method for detecting the palm oil adulteration in sunflower oil. For the selected spectral regions, the classification accuracy is observed to be 100%. It can be concluded that even the lowest possible detection of 5% adulteration detected by the SIMCA approach.

Table 6 SIMCA model classification results for sunflower oil adulterated with palm oil

3.4 Quantitative Analysis and Validation

Partial least square regression (PLSR) was used to develop a quantitative model to predict the percentage of adulteration. This model’s outcome in selected spectral regions is shown in Figs. 10, 11, and 12. The results showed R2 of 0.94 in Region 1, 0.98 in Region 2, and 0.99 in regions selected from correlation loadings, which signifies a better relation between actual and ATR predicted values of adulteration. The difference between RMSEC and RMSEV was significantly less, indicating the correct calibration model development. Intercept is also significantly less, indicating the correct predictability of palm oil in sunflower oil. It can also be observed from Table 7 that all selected wavenumber ranges are helpful in detecting the presence of palm oil in sunflower oil as these regions are associated with a specific functional group, as mentioned previously. Table 7 shows the calibration and validation results at selected wavenumbers.

Fig. 10
figure 10

Reference against predicted adulteration (1781–1635 cm−1)

Fig. 11
figure 11

Reference against predicted adulteration (1492–937 cm−1)

Fig. 12
figure 12

Reference against predicted adulteration (1717–1581 cm−1, 1501–1447 cm−1, 1372–1334 cm−1 and 1263–937 cm−1)

Table 7 Results of PLS regression for estimating the percentage of adulteration in sunflower oil with palm oil

4 Conclusion

The ATR-based absorption spectrum in the spectral regions of 1781–1635 cm−1, 1492–937 cm−1, and the spectral regions selected from the correlation loading plot of PCA clearly show the presence of palm oil adulteration in sunflower oil. SIMCA and PLS-DA results were obtained with 100% and 90% classification efficiency, respectively. The selected wavenumber data was good enough to develop a classification and regression model instead of considering the whole spectrum. The coefficient of determination using the PLS regression for prediction and validation is observed to be above 0.97 at all selected wavenumber regions. The difference in RMSEC and RMSEV was significantly less, indicating a good prediction model. This work provides a simple, non-destructive, and accurate method for adulteration detection and quantification with the potential of detection level up to 5% of palm oil in sunflower oil.