Introduction

Mustard (Brassica juncea) is a versatile crop known for its richness in minerals and high-quality oil. On an average, mustard seeds comprise about 40% oil and are cultivated worldwide to produce edible vegetable oil, making it an integral part in various global cuisines [1]. Renowned for its richness in essential fatty acids like erucic acid (C22:1), alpha-linolenic acid (C18:3), linoleic acid (C18:2) and oleic acid (C18:1), mustard oil holds a significant position owing to its positive impacts on heart and skin health, alongside its anti-inflammatory properties [2]. These nutritional attributes on one hand contributes to the escalating demand for mustard oil; and on the other hand, makes mustard oil a potent tool of enhanced economic dividends for the vendors.

The inadequate production of mustard seeds combined with a modest recovery (26–27%) rate of oil during mechanical milling contributes to a perineal unscalable gap in production and demand [3]. Enhancing oil recovery rates is imperative to bridge this gap and ensure a steady supply of mustard oil to meet escalating consumer demands [4].

While mechanical milling remains a common technique for oil extraction, its inherent limitation lies in the considerable amount of residual oil trapped within the milled cake, leading to incomplete extraction. A range of pre-treatment approaches has been investigated for an enhanced recovery of mustard oil during mechanical extraction. The pre-treatments include size reduction, crushing, hulling, cracking, enzymatic hydrolysis and thermal treatments, each aiming to disrupt the cell matrix of the raw material and enhance oil yield [5]. However, in pursuit of further yield improvement, researchers have shifted their focus towards exploring novel pre-treatment technologies. Pre-treatment improves oil yield compared to mechanical pressing alone, maximizing the amount of oil extracted from mustard seeds [6]. Unlike conventional treatments, ohmic heating (OH) has emerged as a promising environmentally friendly option [7] that uses electrical currents to generate heat within the oil seed matrix, facilitating oil extraction [8]. The application of electrical resistance induces Joule heating, elevating seed temperature and reducing oil viscosity, thereby enhancing oil flow and extraction efficiency [9]. Furthermore, OH can induce electro-permeabilization, augmenting cell membrane permeability and expediting oil release, consequently bolstering oil recovery rates [10]. OH as a treatment method can offer significant advantages during oil extraction. After OH treatment, oil can be extracted through mechanical milling or pressing. The integration of OH as a treatment step to the regular mechanical expression of mustard oil has the potential to revolutionize oil extraction processes, particularly in the mustard oil industry. This aligns with the growing demand for safe, sustainable and efficient food processing methods.

Despite the promise of OH as a treatment method, evaluating their efficacy and impact on oil recovery and quality necessitates robust analytical techniques. Traditional methods, while reliable, often entail laborious procedures, consume significant resources and pose safety risks in laboratory settings [11]. Hyperspectral imaging (HSI) integrates digital imaging with spectroscopic analysis, enabling the capture of spectral and spatial data simultaneously, thereby providing important information about the quality and chemical composition of food products [12]. Leveraging advanced multivariate analysis techniques such as principal component analysis (PCA), partial least squares-regression (PLSR) and discriminant analysis (PLS-DA), HSI facilitates to identify the relevant information from complex spectral data, aiding in the classification, prediction and quantification of specific parameters [13, 14]. In this context, HSI emerges as a rapid, non-invasive and robust approach for assessing the oil content and quality of mustard seeds post the OH treatment.

In light of these considerations, the present study aimed to elucidate the effect of OH treatment on mechanically expressible oil in mustard seeds, focusing on changes in oil content and fatty acid composition by using visible-near infrared (Vis–NIR) and short-wave infrared (SWIR) HSI to identify the spatial spread of oil within mustard seeds after OH treatment.

Material and methods

Experimental materials

The experiment used mustard (Brassica juncea) seeds (var. RH-0749) purchased from the National Seed Corporation Limited, Bhopal, India. The seeds had an average moisture content (w.b.) of 6.9 (± 0.03) % and diameter of 1.7 (± 0.3) mm. All the foreign material from the seeds were cleaned before storing them securely in airtight bags at 4 °C [15]. They were accessed as per the experimental requirement and allowed to equilibrate under laboratory conditions before being used further. All the experiments were conducted in the Machine Vision laboratory, Agro produce and processing division, ICAR—Central Institute of Agricultural Engineering, Bhopal.

Ohmic heating (OH) treatment

The experiments were conducted using an in-house ohmic heating setup (Fig. 1) equipped with a temperature controller for setting and maintaining desired temperatures (0–300 °C), a contactor to handle high currents for the heating element, a Variac for adjusting the applied voltage (0–300 V) during heating with a circuit breaker for overload protection, a metal box for housing the voltmeter, ammeter, indicator lights and an energy meter. The mustard seed had an initial moisture content of 6.9% (w.b.), but preliminary contextual experiments trials revealed that adding moisture to the seeds till a moisture level of 35% facilitated a consistent flow of electric current. This ensured efficient heating and allowed a desired endpoint temperature (EPT) of 90 °C. Conversely, lower moisture levels (20–30%) resulted in inadequate current flow and hindered attaining the target temperature [16]. The moistened mustard seeds were then subjected to ohmic heating with a specified electric field strength (EFS, V/cm) until an EPT of 90 ºC was achieved at the geometric centre. This temperature was maintained throughout the holding time (HT, s). The temperature of the mixture was measured by a thermocouple (Vigital India Pvt. Ltd., New Delhi, India) probe inserted into the mixture in the ohmic heating tray. The mixture was then treated and dried in the shade until it reached a moisture level of 6–7% (w.b.). The dried seeds were subsequently utilized for further analysis and oil extraction. The EFS (22, 24.5 and 27 V/cm) and HT (300, 600 and 900 s) were chosen as the independent variables in the experimental design, which resulted in nine treatments with three replications each. The control samples also underwent water addition (up to 35%) followed by drying to 6–7% moisture content. This was done to isolate the effects of water addition and drying from the OH treatment, ensuring that observed spectral changes can be more accurately attributed to the OH process itself. Classification and quantification of oil and fatty acid content was carried out with HSI for two OH treatment conditions, one (OH1) based on maximum mechanically expressible oil recovery; and two (OH2) based on maximum ohmic heating treatment intensity. The conditions for OH1 and OH2 with respect to EFS and HT were 24.5 V/cm, 900 s and 27 V/cm, 900 s, respectively.

Fig. 1
figure 1

Schematic diagram of in-house developed ohmic heating set-up

Oil extraction using laboratory scale oil expeller

Both treated and control samples of mustard seeds were milled using a laboratory oil expeller (SH-800, Shreeja Pvt. Ltd., Rajkot, India). The milled crude oil underwent sequential clarification through a sieve and a muslin cloth to remove suspended impurities. After extraction, the oil was stored in polyethylene terephthalate (PET) containers and maintained at 4 °C in a refrigerator [15] for analysis.

Fatty acid composition analysis

Standard method [17] was adopted to identify fatty acids in the mustard oil samples from the OH treated mustard seeds. This involved transesterifying triglycerides to fatty acid methyl esters (FAME), which were subsequently analyzed using a GC–MS/MS (QP-2010 Ultra mass spectrometer and GC 2010 with AOC 20i/s chromatograph) equipped with an ELITE-2560 column and helium carrier gas at a specific flow rate. Specific temperature (Initial temperature 100 °C and injection temperature 230 °C) profiles were applied during the analysis. Comparison of FAME mass spectra with the NIST library identified compounds with over 90% similarity, which were then quantified as comparative percentages of the entire peak region.

Hyperspectral imaging

Hyperspectral imaging systems

Two HSI systems were used to obtain hyperspectral images of mustard seeds in reflectance mode. The first of these systems was a Vis–NIR HSI system (Specim Imaging, Oulu, Finland). The wavelength range was 399–1003 nm, with a spectral precision of 2.8 nm for 97 spectral bands. Software Specim-DAQ ver. 3.61 (Specim Imaging, Oulu, Finland) facilitated camera parameter adjustments, real-time data collection and visualization of images. The second HSI system was for the SWIR range (Pika NIR-320, Resonon Inc., USA), with a wavelength range of 895–1712 nm, a spectral resolution of 4.9 nm and 168 spectral bands. A dedicated software Spectronon Pro version 3.4.0 (Resonon Inc., USA) was used for capturing SWIR HSI images. In both the HSI systems, 3-D hyperspectral data cube was obtained by mechanically moving the object using a movable platform. In the SWIR system, each line within the data cube contained 320 individual pixels.

Hyperspectral image analysis

Accurate analysis of hyperspectral images can be ensured after noise correction and removing the uneven distribution of reflectance across the various wavebands by pre-processing the raw hyperspectral image. The correction method involves removing dark current from the camera sensor by acquiring a dark image (Dref) with the light switched off and an opaque lid covering the camera lens to eliminate any reflection. Under the same lighting conditions as the raw image (Rimage), a white reference image (Wref) was acquired using a custom Teflon plate with a uniform and high reflectance of 99.9%. The following equation was used to compute the relative reflectance (R) of the hypercube [18]:

$$R = \frac{{R_{{{\text{image}}}} - D_{{{\text{ref}}}} }}{{W_{{{\text{ref}}}} - D_{{{\text{ref}}}} }} \times 100$$
(1)

Pre-processing of hyperspectral images resulted in erroneous data such as dead pixels, spikes and outliers can distort the spectral signature of the object being imaged and lead to inaccurate results. Thus, it was essential to identify and remove such data points during pre-processing to ensure the accuracy and reliability of subsequent analysis [19]. In this research, spatial binning approach was employed to decrease the spatial dimensions of the image followed by removal of dead pixels. The image background was removed using a mask created by the k-means clustering algorithm.

In spectral pre-processing, a combination of methods was employed, including light scattering correction, noise reduction and background normalization, to ensure accurate analysis of the morphology of mustard seeds. Distinct spectral pre-processing methods, including Standard Normal Variate (SNV), Savitzky-Golay smoothing (SGS), SG First Derivative (SGD-F) and SG Second Derivative (SGD-S) were applied to the data independently and in combination. Both smoothing and derivative were used with a 7-point window and second-order polynomial. Hyperspectral image pre-processing, encompassing both spatial and spectral domains, was conducted using HYPER-Tools v3.0 within MATLAB R2019 (MathWorks, USA).

HSI data from control and OH-treated samples of mustard seed were analyzed. Each HSI sample comprised an average of 76 (± 4) seeds collected as a single layer on a petri dish, and 700 spectral reflectance values were acquired from each sample. This resulted in 2100 pixels for the Vis–NIR region (97 wavebands) and 2210 pixels for the SWIR region (137 wavebands). These datasets were used in subsequent multivariate analyses for model development. The datasets were partitioned into calibration and cross-validation sets to ensure accurate prediction and minimize potential errors using a 70:30 split. An additional test set with 300 pixels for both the Vis–NIR and SWIR regions was used.

Principal component analysis (PCA)

Principal component analysis was used for understanding the impact of OH treatments in terms of the spectral characteristics of mustard seeds [20]. Analyses were performed on both raw and pre-processed hyperspectral data using the Nonlinear Iterative Partial Least Squares (NIPALS) algorithm with mean centering and three principal components. Additionally, hyperspectral images were represented as false colour images by projecting the PCA model to represent pixel colours corresponding to chemical variations [21] which might have been induced by the OH treatment.

Classification model analysis

Partial least squares discriminant analysis (PLS-DA), a well-established statistical method for handling high-dimensional data was utilized to develop classification models for differentiating between control and OH-treated mustard seeds. The optimal number of latent variables (LVs) were selected using the least Root Mean Square Error of Venetian Blinds Cross-Validation (RMSECV) to ensure that the models captured the most relevant information and does not result in overfitting with respect to specific dataset. This calibration and cross-validation approach ensured that the model generalizes well to unseen data by ensuring its accuracy and reliability on unseen samples. The statistical robustness of the models was judged by metrics like specificity, sensitivity, precision, accuracy and error [14].

$${\text{Specificity = }}\frac{{{\text{TN}}}}{{\text{(FP + TN) }}}$$
(2)
$${\text{Sensitivity = }}\frac{{{\text{TP}}}}{{\text{(FN + TP)}}}$$
(3)
$${\text{Precision = }}\frac{{{\text{TP}}}}{{\text{(FP + TP)}}}$$
(4)
$${\text{Accuracy = }}\frac{{\text{TP + TN}}}{{\text{(TN + FP + TP + FN ) }}}$$
(5)
$${\text{Error = }}\frac{{\text{FP + FN}}}{{\text{(TN + FP + TP + FN) }}}$$
(6)

where, TP represents correctly identified positives, TN indicates correctly classified negatives. FP represents incorrectly identified negatives (as positive) and FN indicates incorrectly classified positives (as negative).

Prediction model analysis

The effectiveness of hyperspectral data in predicting mechanically expressible oil and fatty acid composition in OH treated mustard seeds was evaluated through Partial Least Squares Regression (PLSR) analysis. This method aimed to establish a robust relationship between the predictor variables (X) represented by the spectral data and the response variables (Y) comprising the experimentally measured oil and fatty acid values. This relationship was constructed using LVs, orthogonal linear combinations of the original spectral features designed to maximize the covariance between X and Y [22]. Root mean square error (RMSE) and the observed increase in model performance with each additional LV included were used to find the ideal number of LVs, this was achieved by developing two sets of models. The first set comprised full-spectrum models, which utilized the complete spectral range for LV generation. The second set consisted of Interval PLS (iPLS) models wherein the spectral data was segmented into smaller, specific intervals using the iPLS variable selection method. Individual LVs were then generated for each interval, enabling the identification of the most informative wavelength range for prediction [23]. The developed PLSR models underwent rigorous evaluation using diverse metrics. The average prediction error was calculated by root mean square error for calibration (RMSEC), while R2 measured the explained variance for calibration set and cross-validation (R2CV, RMSECV) provided an independent assessment of the model to predict unseen data. The models were independently tested using testing set (R2T, RMSET). This multi-pronged approach ensured reliable PLSR models for predicting mechanically expressible oil and the fatty acid composition in mustard seeds treated with OH.

Fourier transform infrared spectroscopy (FTIR)

The functional groups of the control and OH treated mustard oil samples were examined using an FTIR spectrophotometer (ALPHA Bruker, Ettlingen, Germany) at room temperature. A diamond crystal ATR plate was used for the samples as they were positioned for FTIR analysis. Before each analysis, the ATR plate was thoroughly cleaned with isopropyl alcohol of analytical quality. The spectral data were obtained across the 4000–600 cm−1 wavelength range with a resolution of 2 cm−1, scanner frequency of 7.5 kHz and sample and background each of 24 scans. Opus (v. 6.0, Bruker, Ettlingen, Germany) software was used for instrument process control, spectral sampling and preliminary file modification. To mitigate the interference of H2O and/or CO2 bands in the ratio spectra, compensation was applied.

Results and discussion

Effect of OH treatment on oil and fatty acids

The effect of OH treatment on mustard seeds shows that the mechanically expressible oil obtained from OH1 and OH2 treatments were 33.18 (± 0.05) and 32.94 (± 0.02) %, respectively. The results showed that OH treatment led to an increase in oil content compared to control samples which yielded 26.41 (± 0.18) % oil. The increase in expressed oil can be attributed to the ohmic heat treatment, which might have enhanced oil extraction efficiency from the seeds [24]. Regarding the fatty acid composition, the OH treatment exhibited minor effects on the individual fatty acid contents. While there were slight decreases in C18:1, C20:1 and C18:3 contents, there were no significant changes in C18:2 content. However, the OH treatment resulted in an increase in C22:1 content, indicating a potential influence of thermal treatment on the synthesis or retention of erucic acid. The variations in saturated, unsaturated, MUFA and PUFA content suggest that OH treatment might have induced changes in the overall fatty acid profile of mustard seeds, albeit to a limited extent. The observed increase in saturated fatty acids and the slight decrease in unsaturated fatty acids indicate the potential impact on the overall saturation level of fatty acid composition. However, the differences in MUFA and PUFA content among the treatments were relatively small (Table 1). A similar observation for Gac aril oil was also reported [25]. These findings suggest that OH treatment affects the oil recovery and the proportion of certain fatty acid compositions of the samples.

Table 1 Mechanically expressible oil (%) and fatty acid composition (%) of OH treated mustard seeds

Spectral characteristic

Reflectance disparities caused by OH treatment was substantially more prominent in the SWIR region of the spectrum than in the Vis–NIR region. This is likely because the SWIR region is more sensitive to changes in the chemical composition and structural changes induced by the treatments of the samples. On the other hand, the Vis–NIR regions are more sensitive to changes in the physical properties of the samples, such as their texture and colour [26]. The Vis–NIR–SWIR spectra (Fig. 2a and b) of OH treated mustard seeds show that the control sample has a higher reflectance value than the treated samples. Perhaps OH treatment has resulted in creation of micro cracks on the seed surface and exposed the brighter endosperm this might have made the surface texture rough, resulting in reduced reflectance. Further analysis showed that the spectra of OH1 and OH2 samples were quite similar, likely due to the parameters of electric field strength and holding time not varying sufficiently to produce distinguishable changes in the spectral properties.

Fig. 2
figure 2

Average spectra of OH treated mustard seeds following spectral processing (a and b) SGS, (c and d) SNV + SGS and (e and f) SGD-S for Vis–NIR and SWIR wavelengths, respectively

The Vis–NIR spectra of samples OH1 and OH2 pre-processed with SGS (Fig. 2a) showed a dip observed between 399 and 422 nm, after which the profile was flat until 563 nm. Beyond 563 nm, the spectra exhibited a steady increase up to 1002 nm. A similar pattern was observed for the control sample, albeit with slightly higher reflectance values. Vis–NIR spectra with SNV + SGS (Fig. 2c) pre-processing for control samples demonstrated a sharp decline up to 441 nm and dip at 770 nm, after which the gradual increase in profile was observed, while for both treated samples (OH1 and OH2) rapid increase up to 441 nm was observed and peak at 770 nm followed by the gradual decrease of reflectance values in the profiles. Applying SGD-S spectral pre-processing (Fig. 2e) to the Vis–NIR spectra revealed additional peaks and valleys corresponding primarily to the colour variations of the samples. These features were prominent in the visible range at 422, 447, 490 and 789 nm and in the NIR range at 834, 866 and 905 nm [27]. The seeds being dark brownish to yellowish red (inner), the reflectance value across all the cases demonstrated an increase from 560 nm till the end of spectra, 1000 nm. Conversely, after applying SGS to SWIR hyperspectral data (Fig. 2b), distinctive reflectance bands were identified for control samples at 1087, 1208 and 1463 nm and for ohmic treated samples at peak observed at 1087, 1355 and 1409 and dip at 1189, 1379 and 1563 nm. Furthermore, SNV + SGS (Fig. 2d) showed an additional dip and peak at 1419 nm for control and treated samples. SGD-S pre-processing (Fig. 2f) revealed additional SWIR peaks and valleys at 986, 1063, 1208, 1330 and 1617 nm. The SWIR spectra show that the treated samples have distinctive characteristic bands at 1087, 1355 and 1409 nm. These bands are characteristic of organic compounds, such as carbohydrates and proteins [28].

Principal component analysis (PCA)

The hyperspectral data analysis using pre-processing techniques and PCA revealed significant findings in distinguishing between OH treated and control mustard seed samples. PCA score plots (Fig. 3) constructed for PC1 and PC3 using SGS for Vis–NIR spectral data showed the distinct separation between OH treated and control samples. The control samples exhibited a clustered distribution in the PC3 positive section, suggesting a unique spectral characteristic specific to this group. Moreover, the concentration of control seed samples in negative section of PC1 contributed to their differentiation from the OH treated samples along the PC3 axis, indicating substantial differences in their spectral profiles. Treated samples OH1 and OH2 showed similar dispersed distributions through PC1 and PC3, making it difficult to differentiate among the treatments.

Fig. 3
figure 3

PCA comparison for mustard seeds: control vs. OH treated—a PCA score plot with SGS, b PC1 to PC3 scores from Vis–NIR Data

The loading plot (Fig. 5a) analysis of the Vis–NIR data provided further details on the particular spectral characteristics associated with each principal component. PC1 shows a modest peak at 441 nm, then a plateauing trend extending till 575 nm. This plateau was subsequently surpassed by a steadily increasing profile. On the other hand, PC2 displayed the dip at 447 nm, then an increasing trend up to 569 nm, after which the profile reached a plateau and remained relatively unchanged until 642 nm, after which another notable increase was observed. The PC3 loading profile exhibited a steep downward slope up to 435 nm, followed by a more gradual slope and a dip in the loading at 750 nm before concluding with a sharp rise.

Analysis of the principal component score plot (PC1 vs. PC2) derived from SWIR spectral data with SNV spectral pre-processing of OH pre-treated samples (Fig. 4) demonstrated clear differentiation between treated and control groups. Distinguishing between the treated samples remains challenging due to similar dispersion patterns in the score plot. PC1 provided the most distinct separation between these sample groups on the score plot. Control samples mainly clustered in the negative PC1 region, while treated samples formed a distinct cluster in the positive PC1 region.

Fig. 4
figure 4

PCA comparison for mustard seeds: control vs. OH treated—a PCA score plot with SNV pre-processed, b PC1 to PC2 score from SWIR data

The SWIR data loading plot (Fig. 5b) revealed distinct spectral characteristics for the first three PCs. PC1 exhibited peaks at 1082, 1395 and 1413 nm, alongside dips at 1374 and 1468 nm. Similarly, PC2 displayed peaks at 1136, 1188 and 1364 nm, contrasted by dips at 1164, 1217, and 1532 nm. Finally, PC3 loadings showed peaks at 1140, 1189, 1384 and 1463 nm while exhibiting dips at 1082 and 1408 nm.

Fig. 5
figure 5

PCA loadings for spectral data of OH treated mustard seeds a Vis–NIR, and b SWIR HIS

The Vis–NIR and SWIR analyses provided better understanding of the unique spectral characteristics associated with each sample group and PC1 proved particularly effective in discriminating between the control and treated samples. However, it should be noted that differentiating among the treatments (OH1 and OH2) within the treated samples was challenging due to their similar dispersion patterns in the score plot. This indicates that the spectral profiles of OH1 and OH2 treatments may have overlapping characteristics, making it difficult to distinguish them solely based on the PCA score plot. Researchers used PCA to classify brassica species [23] and infected maize seeds [21].

Classification model development

The PLS-DA model performed good during the classification of control samples, as indicated by high sensitivity, specificity, precision and accuracy values using both Vis–NIR and SWIR spectral data (Table 2). In the case of Vis–NIR, the highest classification scores were obtained without pre-processing of raw spectral data. The calibration model presented high sensitivity for the control sample (0.997), while for the OH1 and OH2 samples, it decreased to 0.557 and 0.675, respectively. The cross-validation and testing models presented a similar pattern, with high sensitivity for the control sample (0.997) and decreased sensitivity for OH1 (0.535 cross-validation, 0.518 testing) and OH2 (0.711 cross-validation, 0.736 testing). Specificity was generally high in calibration, cross-validation and testing sets, with values ranging from 0.906 to 0.977 in all categories. Precision varied more prominently with control samples demonstrating high precision (0.984 calibration, 0.981 cross-validation and 0.979 testing), but reduced precision for treated samples. The accuracy of the Vis–NIR model was not more than 0.933 with the non-assigned rate of 0.610. The highest classification accuracy for the SWIR spectral data was achieved using SNV with SGS pre-processing. Calibration, cross-validation and testing models showed perfect sensitivity (1.000) for the control sample and moderately high sensitivity for OH treated samples. The specificity was also perfect (1.000) for the control samples and remained high for the treated samples. Precision was perfect for control samples but reduced for treated samples. The accuracy of the SWIR model was higher compared to the Vis–NIR model, with values more than 0.958 at a non-assigned rate of 0.627.

Table 2 PLS-DA classification model performance for OH pre-treated mustard using raw spectral data of Vis–NIR HSI and SNV + SGS spectral data of SWIR HSI, with 10 and 8 LVs, respectively

The results demonstrate a high level of accuracy suggesting that the model was proficient in distinguishing control samples from treated ones. This successful classification highlights the effectiveness of the model in differentiating between the two sample types based on their spectral characteristics. However, the performance of the model is reduced when classifying OH treated mustard samples using the Vis–NIR spectral data. The non-assigned rates were higher than desired in all cases. This suggests that the model may be misclassifying a proportion of samples [29]. Similar findings were observed for PLS-DA models using HSI spectra for barley [30], rice grains [31] and corn kernels [32].

Prediction model development

Distinct PLSR models were built to analyse the Vis–NIR and SWIR spectra of OH treated mustard seeds (Table 3). The Vis–NIR model achieved the best prediction accuracy for oil content using SNV spectral pre-processing, with an R2 value of 0.767 and RMSE of 1.509. The R2CV and R2T was 0.755 and 0.761 respectively and the corresponding RMSECV and RMSET was 1.550. and 1.529 respectively. These values indicate a moderate correlation (R2 < 0.80) between the predicted and actual oil content. The fatty acid predictions for Vis–NIR spectra combined with the SGD-S smoothing pre-processing technique achieved better results than SNV pre-processing with LVs ranging from 7 to 10. These fatty acids achieved R2 values varied between 0.736 and 0.794 during calibration and R2CV and R2T values varied between 0.703 and 0.784 during cross-validation and testing. The best prediction performance was observed for C20:1 by SGD-S smoothing pre-processing with R2, R2CV and R2T values of 0.794, 0.768 and 0.784, respectively, and RMSE, RMSECV and RMSET values of 0.087, 0.093 and 0.095, respectively.

Table 3 Prediction of fatty acid composition and oil content in OH treated mustard seeds from PLSR model using Vis–NIR–SWIR HSI spectra

In case of SWIR spectra, the prediction of oil content using SNV combined with SGD-S pre-processing achieved excellent results. An R2 value of 0.988 was obtained during calibration, with an RMSE of 0.338. The R2CV and R2T was 0.987 and 0.991 respectively and the corresponding RMSECV and RMSET was 0.355. and 0. 292 respectively. During fatty acid predictions using SWIR spectra, C18:1, C20:1, C18:3, C22:1, saturated and unsaturated fatty acids all achieved high R2 values of 0.992 to 0.996 during calibration and for cross-validation and testing the R2CV and R2T values ranged from 0.935 to 0.996, with 2 LV. MUFA and PUFA predictions achieved R2 values of 0.922 and 0.984 during calibration, respectively; for cross-validation, the R2CV values were 0.917 and 0.984, and for R2T values were 0.921 and 0.984 with LVs ranging from 2 to 8.

Ohmic heat treatment resulted in moderately accurate predictions of oil and fatty acid content in Vis–NIR, but SWIR spectra provided significantly better results. The use of SNV pre-processing alone showed excellent results. This can be attributed to the SWIR spectral range capturing more detailed information regarding the chemical composition and molecular structures of the mustard seeds. The SWIR spectra captured more discriminative features relevant to the oil content and fatty acid compositions, resulting in more accurate predictions [33]. These findings align with observations reported in prior studies on various crops, including brassica seeds to predict oil content [23] and chia seeds to predict fatty acid concentration [34].

The specific wavelengths were identified by iPLS model, this model demonstrated its effectiveness to predict the mechanically expressible oil content in mustard seeds (Table 4). In case of Vis–NIR spectra, the model selected key wavelengths in the range of 399–465, 520–539, 557–563, 582–588 and 632–638 nm, utilizing 2 LVs. In the SWIR spectra, prominent wavelengths for oil content prediction ranged from 967 to 991 nm, employing 1 LV. These findings suggest that these wavelength regions contained valuable information for accurately predicting oil content in the mustard seed samples. While the iPLS model applied to Vis–NIR spectra demonstrated moderate success in predicting overall fatty acid composition, its most notable strength lies in quantifying C20:1. The selected wavelengths encompassed 399–613, 632–638, 656–675, 757–776, 796–815 and 925–944 nm, employing 4 LVs. In case of SWIR spectra, the iPLS model showed exceptional performance. The best performance of SWIR spectra was observed for predicting C18:1, the selected wavelengths are 967–991 nm utilizing 4 LVs. Similar results were reported for choy sum [35] and brassica seeds [23, 36] to find specific wavelengths using iPLS variable selection method.

Table 4 Prediction of fatty acid composition and oil content in OH treated mustard seeds from iPLS model using Vis–NIR–SWIR HSI spectra

Prediction map

The spatial spread of oil and fatty acid components within experimental conditions of OH-treated mustard seeds was comprehensively analyzed using dedicated prediction maps generated for each component (Figs. 6 and 7). This approach enabled a meticulous seed-by-seed understanding of the OH pre-treatment conditions which impacted the localization and concentration of these vital constituents within individual seeds.

Fig. 6
figure 6

Spatial distribution of predicted fatty acid/oil content in OH-treated mustard seeds (PLSR-based Vis–NIR HSI map)

Fig. 7
figure 7

Spatial distribution of predicted fatty acid/oil content in OH-treated mustard seeds (PLSR-based SWIR HSI map)

A visual representation of component intensity was created using a MATLAB jet colormap. Blue hues denote lower values, while red hues signify higher values. For ease of interpretation, the mean reference measurement values of each sample are presented above their respective colourmap. The prediction map displayed spatial distribution patterns at the individual pixel level. The spatial distribution of predicted component concentrations was visualized through colour intensity variations in the prediction maps, where each colour intensity of each pixel represented its predicted component concentration. Notably, the SWIR HSI-based map clearly outperforms the Vis–NIR HSI map in terms of superior level of clarity and distinctiveness of component distribution. This feature of SWIR can be attributed to the fact, that wavelength of SWIR is longer than Vis–NIR, hence it has a higher imaging resolution, reduced scattering, lower light absorption and better penetration capabilities through biological materials, all these features resulting in information which can be more easily resolved [37, 38]. Nevertheless, researchers also use prediction maps to observe the distribution of TA, TSS and firmness in mango samples [39]. Also, the prediction maps show the spatial distribution of ground papaya seeds as an adulterant within black pepper samples [40].

Fourier transform infrared spectroscopy (FTIR)

FTIR spectrum (Fig. 8) of oil from control and oil from OH treated mustard seeds samples (OH1 and OH2) was examined to explore changes in peak wavelengths and absorbance values, aiming to understand the potential molecular alterations that might have been induced by ohmic heating treatment. The spectra of control and treated mustard oil samples were collected in the mid-infrared region within the 4000–600 cm−1 range. The control samples exhibited peaks at wavelengths of 2923, 2856, 2352, 1744, 1456, 1365, 1159 and 716 cm−1. The treated samples exhibited similar peaks, except at 2362 cm−1, where no peak was observed in the treated samples. However, the remaining peaks showed decreased absorbance values compared to the control samples, depending on the specific treatment conditions. This decrease in absorbance indicates structural changes and potential modifications in the molecular composition associated with the formation of FFA and diglycerides as a result of ohmic heating treatment [41]. One prominent peak at 2923 cm−1 corresponds to the asymmetric stretching vibrations of the CH2 groups present in aliphatic hydrocarbon chains. Similarly, the peak observed at 2856 cm−1 is associated with the symmetric stretching vibrations of the CH2 groups in aliphatic hydrocarbon chains. The decrease in absorbance at this wavelength indicates a potential reduction in unsaturated or polyunsaturated fatty acids in the treated mustard oil. It can be inferred that the treatment may have influenced the degree of saturation or altered the fatty acid composition of the oil [42, 43]. The peak observed at 1744 cm−1 corresponds to the stretching vibrations of the carbonyl group (C = O) in esters, aldehydes and carboxylic acids. The decrease in absorbance at this wavelength suggests a potential reduction in the concentration of these functional groups in the treated mustard oil. Moreover, several other peaks were observed at 1456, 1365, 1159 and 716 cm−1 associated with functional groups such as CH2 bending vibrations, CH3 bending vibrations, C–O stretching vibrations and aromatic C–H bending vibrations [41]. The decrease in absorbance at these wavelengths suggests possible alterations in the composition of these functional groups induced by the treatment. These findings suggest that the potential modifications in the molecular composition and structure have been caused by the OH treatment.

Fig. 8
figure 8

FTIR spectrum of untreated and OH treated mustard seed oil samples

Conclusion

Exploring the possibilities of extracting more edible oil from mustard seeds was the key motivation for this research work. Ohmic heating (OH) is one treatment which can cause subtle rupture to the seed structure, break the oil–bearing cellular matrix and thus enhance the availability of extracted oil from mustard seeds during mechanical oil expression. Designed experiments were conducted with electric field strength and holding time as variables for treating mustard seeds in an ohmic heating setup. Further, this research investigated the potential of Vis–NIR–SWIR hyperspectral imaging (HSI) combined with chemometrics for non-destructive assessment of oil content and fatty acid composition in OH treated mustard seeds. PLS-DA models achieved high accuracy in differentiating OH-treated seeds from control samples, with 93.0 and 95.8% accuracy for Vis–NIR and SWIR, respectively. Additionally, PLSR models built using SWIR data exhibited excellent predictive capabilities for oil content and most of the fatty acids achieved R2 values exceeding 0.92. The predicted values aligned well with the results obtained from the reference analytical methods. The results revealed superior performance of SWIR–HSI compared to Vis–NIR data in both classification and prediction models. The ability of the variable selection method (iPLS) to identify crucial wavelengths for accurate prediction with minimal performance loss further highlights the potential of Vis–NIR–SWIR HSI as a rapid, non-destructive tool for assessment of oil quality in terms of fatty acid composition and oil content in mustard seeds.