1 Introduction

Calcium carbide (CaC2) is a common ripening agent that is primarily used in bananas to accelerate ripening and achieve uniform post-harvest ripeness [1]. Hydrolyzed CaC2 liberates acetylene gas, which accelerates the ripening process in a manner similar to that of natural plant-generated ethylene [2]. Acetylene gas hastens the respiration process that causes bananas to soften and degrade chlorophyll, thereby developing a yellow color in ripe bananas [3]. The application of this technique and other harmful artificial ripening techniques remain banned in several jurisdictions [4, 5]. However, the continued illegal use of industrial-grade CaC2, especially in developing countries, poses a significant risk to human health [6]. This is because of the presence of hazardous impurities liberated by the reaction of CaC2 with water, such as phosphine, arsine, hydrogen sulfide, and sulfur [7]. The resultant fat-soluble hydrides from these compounds can diffuse into the peel and flesh of carbide-ripened fruits [4, 8]. A study conducted by Nowshad [9] indicated that the amounts of sulfur, phosphorous, arsenic and other heavy metals in CaC2 exceed the permissible levels set by various food agencies globally. Recent studies have shown that these chemicals are harmful to human health, causing memory loss, cerebral edema, colonic and lung cancers, and other health complications [4, 6, 7, 9]. To ensure consumer safety, it is critical to screen for and determine the compositions of harmful chemical additives in bananas and other fruits.

The methods currently used to detect these additives include high-performance liquid chromatography (HPLC), gas chromatography-mass spectrometry (GC–MS), and ELISA [10,11,12]. Despite the high level of accuracy achieved by these methods during analysis, they have significant shortcomings, i.e., for each type of additive contained in the artificial ripener, a specific method usually must be used, implying extended turnaround time for results [9]. Furthermore, some organic solvents used in these processes are toxic, raising concerns about environmental safety. The sample preparation process is also labor-intensive because the samples under examination must be digested and treated chemically before they are ready for analysis. Therefore, alternative rapid and cost-effective approaches for on-site screening of harmful chemical additives in fruits are urgently required to support intervention measures.

Compared to wet chemistry methods, Raman spectroscopy is relatively fast, with several other advantages, including high sensitivity, minimal sample preparation, and non-destructive to the samples [13]. More importantly, Raman measurements can be performed on the spot, allowing multiple analytes to be detected simultaneously. However, the analysis of Raman spectra can be complicated by the strong intrinsic fluorescence background (particularly for biological samples) as well as spectral interference caused by the matrix or target molecules [14,15,16]. Several approaches have been proposed to overcome these challenges, including fluorescence background suppression [14, 17, 18], the use of lasers with high excitation wavelengths [13, 19], baseline correction using mathematical and computational approaches [20,21,22], and the use of chemometrics [23, 24].

Because of the complex and non-homogeneous composition (elemental and molecular) of bananas [25], typical Raman spectra are bound to have an intense fluorescence background as well as multiple overlapping bands or no peaks in a very noisy background. This implies that the accurate characterization and determination of artificial ripeners in bananas using the classical Raman spectroscopy approach is not straightforward. In this study, we report, for the first time, a direct, rapid, and relatively low-cost screening method for calcium carbide in bananas based on laser Raman spectroscopy. We demonstrate how to perform sample preparation and spectrum acquisition in a timely and cost-effective manner, as well as how to use spectral data preprocessing techniques to enhance the desired spectral features. Finally, we show how the principal component analysis (PCA) technique can be used to accurately identify the molecular fingerprints of sulfur (S), acetylene (C2H2), calcium hydroxide (Ca(OH)2), and phosphine (PH3) molecules to infer the presence of industrial-grade calcium carbide.

2 Materials and methods

2.1 Samples preparation

Fifty freshly harvested bananas were washed under running water to remove contaminants that could have affected the experimental data and then air-dried for less than 1 h. Twenty samples were placed in a sealed carton box and allowed to ripen naturally for 48 h. This group of samples formed the control group and their measurements served as standards in the study. To simulate the field use of industrial-grade CaC2 (sourced from local vendors) for artificial ripening of bananas, the remaining 30 banana fingers were dipped in carbide solutions and placed in sealed carton boxes for 48 h to continue ripening, as previously reported [4]. After 48 h, the samples (both treated and untreated) were sliced to a profile depth of approximately 5 mm, spread through the peel and flesh, and placed on glass slides for Raman measurements.

2.2 Laser Raman spectroscopy measurements

A confocal laser Raman spectrometer (Seki Technotron Corp, Japan) coupled with a 785 nm laser for excitation was used. The instrument was equipped with a 300 mm imaging triple grating monochromator spectrograph and a backscattered-illuminated CCD camera to acquire spectra within the optimized time frames. To eliminate background contribution from the fluorescent lamps, the instrument was calibrated with a standardized silicon wafer with a peak intensity at 520.5 cm−1 before taking the measurements in a dark room. The laser power delivered to the samples using the X 50 objective lens was set to a maximum (≈ 6mW) and no sample burning was observed. In addition, at this power and depth of 5 mm, the laser was largely absorbed and scattered within the samples, with little chance of reaching the glass slide [26]. Finally, an average of 10 spectral accumulations was taken from each sample, with a 10 s exposure time.

2.3 Spectral data preprocessing and exploratory analysis

The obtained laser Raman spectra (LRS) were pre-processed using Spectragryph software [27]. Baseline offsetting was performed on the raw LRS data by applying a modified polynomial fit with a coarseness value of 3 points. The baseline was subtracted from the spectra to eliminate the broad fluorescence background. Smoothing was then performed using the Savitzky-Golay algorithm with an interval of nine points and a polynomial of order two to remove noisy artifacts in the spectra. To correct for intensity variations in the spectra caused by chemical factors, such as intermolecular reactions, normalization was performed using the standard normal variate method. Finally, based on the vibration modes of the target molecules (see Table 1), the wave-shift region of interest was determined to be 200–1200 cm−1.

Table 1 Common impurities found in CaC2 and their Raman peaks

After pre-processing, the LRS data were imported to R 4.0.5 (open-source) software [32] for PCA with the chemospec package [33]. PCA, a chemometric technique [34], was used to perform multivariate exploratory analysis of the highly multidimensional LRS data, transforming it into a smaller set of latent variables (i.e. principal components (PCs)) containing the same information as the spectra. In addition, the PCA graphical interface provides useful information for visualizing and understanding the data. The scores plot was used to show the sample covariance in this case, while the loadings plot was used to show how the LRS affected each PC. As a result, both the scores and loadings plots aided the identification of molecular bands that were distinct from the constituents of the naturally and carbide-ripened samples. Finally, a confusion matrix was used to evaluate the performance of the grouped observations in the PCA score plots based on different wave-shift regions of interest.

3 Results and discussions

3.1 LRS features of naturally and carbide ripened bananas

The typical raw LRS measured in the naturally ripened and carbide-ripened banana samples are shown in Fig. 1a. As expected, it may be observed that the fluorescence background was strong for the two spectra with no obvious characteristic peaks in the samples. This is mainly because the incident excitation energy at 785 nm was consumed, to a great extent, by resonance absorption and fluorescence; hence, the interaction with the components of interest was weak [14]. Nonetheless, one unique feature that distinguishes the two spectra is the intensity; it may be observed that the carbide-ripened sample has a relatively higher intensity compared to the naturally ripened sample. This observation is consistent with the findings of Nakamoto [35], who found that the intensity of Raman scattered light varies directly with analyte concentration. Figure 1b illustrates the broad fluorescence background in the region < 400 cm−1 that obscures the Raman signal of the target molecules, necessitating spectral preprocessing, as well as the resulting preprocessed spectra.

Fig. 1
figure 1

a Raw LRS for naturally and carbide-ripened banana samples and b combined view of raw, baseline and baseline corrected LRS

As shown in Fig. 2a and b, the Raman signals for naturally and carbide-ripened bananas become more pronounced after pre-processing, revealing some characteristics of the various molecules present in the samples. However, it is still difficult to visually distinguish between the two because of spectral overlaps, improperly defined peaks, and spectral noise. The spectral profile of a homogeneous industrial-grade CaC2 pellet sample containing several molecules resulting from various impurities is shown in Fig. 2c with highlighted fingerprint regions (peaks). Notably, the intensity values around the wave shift regions of 225 cm−1 and 480 cm−1 show that sulfur molecules dominate the other analytes of interest. The peaks of the other molecules of interest are obscured because of their low concentrations and weak Raman modes.

Fig. 2
figure 2

Pre-processed LRS of a naturally-ripened banana, b carbide-ripened banana and c reference spectrum of a CaC2 pellet for the wave-shift range 200–1200 cm−1. The highlighted regions (i.e. S, C2H2, Ca (OH)2 and PH3) represent the Raman bands associated with calcium carbide impurities

3.2 PCA for the LRS wave-shift range 200–1200 cm−1

PCA results for the LRS in the wave-shift range of 200–1200 cm−1 show that PC1 can distinguish between naturally ripened and carbide-ripened samples (see Fig. 3a). The loadings plot (see Fig. 3b) shows that the Raman bands across the entire wave-shift region of the ROI influence the clustering of carbide-ripened samples in the negative PC1 axis, with the exception of the wave-shift region of 230–250 cm−1. This confirms that the intensity of the target molecules is higher in the carbide-ripened samples than in the naturally ripened samples, as shown in Fig. 1a. Furthermore, the variability (≈ 46%) explained by the two PCs (PC1 and PC2) reflects the rich contribution of the molecules present, as well as the spectral noise within the ROI. In this case, more than ten PCs are required to explain 100% of the variability.

Fig. 3
figure 3

PCA results of naturally- and carbide-ripened samples for the wave-shift region of 200–1200 cm−1 (a) scores plot and (b) loadings plot. Ellipses in the scores plot represent the 95% confidence limits for the two clusters indicated. Raman bands of the targeted molecules for calcium carbide impurities are indicated on the reference spectrum in the loadings plot

3.3 PCA for the Raman bands associated with calcium carbide impurities

The PCA scores plots for the LRS in the Raman bands associated with the calcium carbide impurities sulfur, acetylene, calcium hydroxide, and phosphine are shown in Fig. 4. Although the cluster separation for the two groups of samples is smaller than when the wave-shift region of 200–1200 cm−1 is used (see Fig. 3), more information concerning the explained variability (i.e., > 70% for all molecules) of the data is now evident. It was also observed that PC1 could be used to differentiate between naturally ripened and carbide-ripened samples. Samples not included in the cluster ellipses accounted for approximately 5–15% of the total samples.

Fig. 4
figure 4

PCA scores plot for a sulfur wave-shift range (200–250 cm−1), b sulfur wave-shift range (450 – 500 cm−1), c acetylene wave-shift range (600 – 650 cm−1), d calcium hydroxide wave-shift region (750 – 800 cm−1), e phosphine wave-shift region (950 – 1000 cm−1) and f phosphine wave-shift region (1100 – 1150 cm−1). Ellipses around each cluster represent the 95% confidence limit for that group on PC1 and PC2 axes

Table 2 shows the loadings that have the most influence in the groupings of the carbide-ripened and naturally ripened samples based on Raman bands for S, C2H2, Ca(OH)2, and PH3. For sulfur, the Raman active peaks indicated by features at approximately 225 cm−1 and 478 cm−1 were strongly associated with the carbide-ripened samples in the negative PC1 and positive PC1, respectively (see Fig. 4a and b). The peak at 225 cm−1 can be attributed to symmetric S–S bending, whereas the one at 478 cm−1 represents the ν(S–S) cross-ring stretching vibration mode of sulfur molecules that exist as puckered rings [29]. For acetylene, the peak centered at approximately 626 cm−1 is responsible for the clustering of the carbide-ripened samples in negative PC1 (see Fig. 4c). This is in contrast to the previously reported peak at 612 cm−1, which, while causing the (C-H) out-of-plane bending mode of acetyl molecules, is very weak [31]. The peak responsible for carbide-ripened samples in the negative PC1 for calcium hydroxyl (see Fig. 4d) is centered around 780 cm−1. This peak can be attributed to the asymmetric bending vibration of the hydroxyl molecule (O–H) [30]. Finally, the peaks indicated by features at approximately 979 cm−1 and 1115 cm−1 for phosphine may be responsible for the clustering of carbide-ripened samples in the positive PC1 and negative PC1, respectively (see Fig. 4e and f). The peak at 1115 cm−1 represents the asymmetric bending mode, whereas the peak at 979 cm−1 can be attributed to the out-of-plane bending mode of the non-polar covalent bonds of phosphine molecules [28].

Table 2 Ten most influential wave-shifts in the fingerprint regions of sulfur, acetylene, calcium hydroxide, and phosphine molecules on PC1 loadings

As shown in Table 3, the sulfur fingerprint region (450–500), followed by calcium hydroxide and phosphine (950–1000), provided the best discrimination of the LRS in the clusters indicated on the score plots (see Fig. 4). The good results for sulfur (accuracy = 94%, kappa = 0.87) can be attributed to its relatively high abundance in the calcium carbide production process when compared to other impurities [9]. The poor results for acetylene (accuracy = 59%, kappa = 0.23) can be attributed to its weak Raman active mode in the 600–650 cm−1 wave-shift region [31].

Table 3 Confusion matrix for evaluating PCA score plot groupings data (excluding the outliers)

4 Conclusions

In this study, the use of a novel chemometric-assisted laser Raman spectroscopy method for the direct, rapid, and simultaneous screening of sulfur, acetylene, calcium hydroxide, and phosphine molecules in carbide-ripened bananas was evaluated. It has been shown that method can be used to detect the presence of multiple chemical additives on fruits simultaneously without special preparation. Furthermore, this method has the potential to overcome the limitations of traditional Raman spectroscopy in the analysis of complex biological samples that are noisy and without properly defined peaks. Compared with acetylene, calcium hydroxide, and phosphine, the sulfur fingerprint region (450–500 cm−1) was more accurate (accuracy = 94%, kappa = 0.8) in the discrimination of naturally and carbide-ripened banana samples. In addition to the information deduced from the peaks responsible for the various clusters in the PCA scores plot, a new set of fewer variables (PCs) was obtained, which could be used as inputs in machine learning algorithms for future quantitative studies. Currently, portable handheld Raman devices are capable of recording reliable and repeatable spectra in this application. It is envisioned that the method presented in this work can be extended for use not only in the laboratory but also for on-site measurements.