Introduction

Nowadays, consumers demand high-quality, additive-free, minimally-processed, nutritious, and fresh-like products. Freshly-squeezed fruit juice labeled as 100 % fruit is typically one of these products. This kind of juice is wholly made of fresh fruits either unprocessed or processed by means of novel techniques such as high pressure pasteurization (HPP) which preserve the overall freshness of the product and its organoleptic and nutritional characteristics (Faria et al. 2013).

During post-harvest ripening and the senescence process, respiration and metabolic activities of fruits continue, resulting in changes in fruit quality such as firmness, soluble solids content (SSC, 0Brix), pH, and Vitamin C (VC) (Raffo et al. 2002). Fruits are relatively easy to authenticate by their morphological characteristics (such as color and firmness) and flavor (odor and taste). However, the act of processing fruits into juices makes it difficult to identify their freshness, e.g., it is hard to tell whether the juice is squeezed from fresh fruits or not. Therefore, it is important to develop a method that can authenticate juices and track the freshness of fruits through the squeezed fruit juices.

Traditionally, sensory and instrumental techniques are used to describe the profile of horticultural products. However, sensory analysis requires panels of trained technicians, and low-cost experimental methods such as destructive mechanical tests and other conventional physicochemical measurements are specific for a particular quality index, instead of providing comprehensive quality information (Beullens et al. 2006). On the other hand, high-end instrumental techniques such as gas chromatography (GC), gas chromatography with mass spectrometry (GC-MS), sniffing GC-MS, and/or high-performance liquid chromatography (HPLC), combined with appropriate sample preparation techniques, could provide information on the chemical compositions of the sample (Berna et al. 2004). Nevertheless, these analytical techniques often require laborious and time-consuming sample preparation, as well as skilled personnel to operate the equipment and to interpret the analytical results (Baldwin et al. 1998; Fallik et al. 2001).

Electronic nose (e-nose) and tongue (e-tongue) techniques, which have been inspired by the way mammal recognize samples via olfaction and taste respectively, offer a fast, comprehensive, and easy-to-handle alternative to assess food quality (Escuder-Gilabert and Peris 2010; Schaller et al. 1998). An e-nose contains a non-selective chemical sensor array based on conducting polymers, metal oxides, surface acoustic wave devices, quartz crystal microbalances, or a combination of these devices, a signal processing subsystem and a pattern recognition subsystem (Gardner and Bartlett 1994; Ping et al. 1997). Similarly, an e-tongue is a sensor array combined with a pattern recognition system for liquid analysis using a combination of several non-specific, low-selective, chemical sensors with high-stability, cross-sensitivity, and ion-selective sensors (Winquist et al. 1997). For fruit juice, the e-nose and e-tongue have been reported to successfully characterize odor and taste, respectively. The e-nose has been applied for early detection of Alicyclobacillus spp. in peach, orange, and apple juices (Gobbi et al. 2010); classification of citrus juices according to fruit type (Reinhard et al. 2008); detection of orange juice treatment (Shaw et al. 2000); and classification of white grape musts (grape juices before fermentation) by variety (Roussel et al. 2003). The e-tongue has also been applied to the classification of apple-based juices (Bleibaum et al. 2002) and apple varieties (Rudnitskaya et al. 2006), the determination of orange juice percentage in juice beverages (Gallardo et al. 2005), and simulation of juice aging process (Legin et al. 1997).

Nevertheless, it should be noted that the two sensor systems do not examine the same features when applied to the same liquid sample. The e-nose sensors are in contact with the headspace while the e-tongue electrodes are immersed in the sample. Using the e-nose or e-tongue alone may not be sufficient (Di Natale et al. 2001; Gomez et al. 2008; Kantor et al. 2008; Torri et al. 2010; Zhang et al. 2008), while on the other hand, simultaneous application of e-noses and e-tongues may increase the amount of information extracted from a sample when compared to the information from a single sensory organ (Di Natale et al. 2000).

In this research, a combination of the e-nose and e-tongue was used to track the freshness of youbei cherry tomatoes by detecting the squeezed juices. The physicochemical quality indices of firmness, SSC, pH, and VC were also measured. Different feature extraction and sensor fusion approaches were discussed. The main objectives of this research were (1) to explore the potential for tracking fruit freshness by detecting the squeezed fruit juice with e-nose and e-tongue, (2) to compare freshness discrimination and prediction performances based on different data fusion approaches, and (3) to explore if simultaneous use of data from both instruments would increase the extent of information regarding the sample or lead to data redundancy.

Materials and Methods

Sample Preparation

The research was conducted twice: the first time was aimed to explore and build models while the second time was aimed to verify robustness and generalization of the models. A Chinese variety, the youbei cherry tomato, was selected for the experiments. For the first time of research, the samples were hand harvested on July 3, 2012 from the experimental orchard located at the Department of Horticulture, Zhejiang University, Hangzhou, China. All tomatoes were picked at the light red stage (approximately 70 % of the surface, in the aggregate, shows pinkish-red or red) (USDA 1997). Upon arrival at the laboratory, the cherry tomatoes were selected according to an approximately uniform size and weight and being undamaged and unattacked by worm. The selected samples were then rinsed with clean water and wiped dry with clean cloth before being stored in a ventilated container at 25 ± 1 °C and 80 ± 5 % relative humidity for 8 days (under these conditions freshly-picked youbei cherry tomato usually becomes overripe and decaying at day 8).

Measurements were taken every 2 days, i.e., on day 0, 2, 4, 6, and 8 (the harvest day). On each measuring day, an appropriate amount of cherry tomatoes was placed in a fruit squeezer and juiced for 30 s to obtain 100 % fresh juices. The juice sample was later divided into two parts: one for e-nose detection and the other for e-tongue detection. The juicing process was repeated 25 times to provide 25 samples each for the e-nose and the e-tongue. For e-nose detection, the juice samples were directly detected, while for e-tongue detection, the juice samples were first filtered using medical gauze that was folded into eight layers and then the filtered liquids were collected for detection.

For the second time of research, a small amount of light-red youbei cherry tomatoes were picked again on July 14, 2012. Twenty-five juice samples were prepared each for e-nose and e-tongue detections. The juicing and measuring operations were the same as in the first time of research.

e-Nose and e-Tongue Instruments

Headspace analysis was performed with a commercial PEN 2 e-nose (Airsense Analytics, GmBH, Schwerin, Germany). The sensor array of this analytical instrument is composed of ten different metal oxide semiconductors (MOS) positioned in a small chamber. A description of these sensors has been given in our previous work (Hong et al. 2012). Two kinds of data are obtained from the e-nose, one is R (the resistance value of the sensors when the sample gas flows through them) and the other is G/G 0 , where G and G 0 are the conductivities of the sensor when exposed to the sample gas and the zero gas, respectively. The G/G 0 value is more reliable because it can avoid sensor drift to some degree, so in this study, the G/G 0 value was chosen as the initial signal.

Taste analysis was performed with α-Astree e-tongue (Alpha MOS Company, Toulouse, France). This taste sensor consists of an array of seven liquid cross-sensitive electrodes or sensors named ZZ, BA, BB, CA, GA, HA, and JB, respectively (a description of these electrodes has also been given previously (Wei et al. 2009)), a 16-position autosampler, and associated interface electronic module. The sensors are made from silicon transistors with an organic coating that governs the sensitivity and selectivity of each individual sensor. The potentiometric difference between each individually coated sensor and the Ag/AgCl reference electrode in the equilibrium state was measured and recorded at room temperature.

Experimental Procedures

e-Nose Sampling Procedure

One hundred twenty five juice samples (25 replications × 5 storage time (ST)) were prepared for e-nose detection. Each sample (10 mL of cherry tomato juice) was placed in a 500 mL airtight glass vial that was sealed with plastic wrap. The glass vial was closed for 10 min (headspace-generation time) while the headspace collected the volatiles from the samples. During the measurement process, the headspace gaseous compounds were pumped into the sensor arrays (400 mL/min) through Teflon tubing connected to a needle in the plastic wrap, causing the ratio of conductance of each sensor to change. The measurement phase lasted for 70 s, which was long enough for the sensors to reach stable signal values. The signal data from the sensors were collected by the computer once per second during the measurements. When the measurement process was complete, the acquired data were stored for later use. After each experiment, a calibration procedure was carried out to reduce the influence of external parameters such as variation in the relative humidity of the air, changes in the temperature, and the drift of the sensors over time, using zero gas (air filtered by active carbon).

e-Tongue Sampling Procedure

One hundred twenty five juice samples (25 replications × 5 ST) were prepared for e-tongue detection. During the experiment, 80 mL of each sample was injected into a 120 mL beaker for detection. The measuring time was set to 120 s for each sample, and the sensors were rinsed for 10 s using ultra-pure water to reach stable potential readings before detecting the next sample. Four replicated measurements were run on each sample. The first three measurement cycles were discarded due to instability, and only the fourth stable sensor responses were obtained to be the original data from the sample.

Measurements of Soluble Solids Content, pH, Firmness, and Vitamin C

On each measuring day, SSC, pH, firmness, and VC of cherry tomatoes were measured. For each quality index, 25 replicates were prepared (to correlate with the numbers of e-nose/e-tongue measurements).

The SSC of juice was measured by a temperature compensating refractometer in 0Brix (Digital refractometer 2WA-J 0–32 %, Shanghai, China), and pH was measured by a titrimeter (Ti-Touch-916, Metrohm, Herisau, Switzerland).

The VC concentration was measured using laboratory methods according to National Standard of the People’s Republic of China (GB/T 6195–1986, 1986), and its value was expressed as milligram ascorbic acid per 100 g of tomato (mg/100 g).

The cherry tomato firmness was measured through a puncture test using a Universal Testing Machine (Model 5543 Single Column, Instron Corp., Canton MA, USA). The penetrating force on an individual fruit was measured at three positions along the equator approximately 120o apart, perpendicular to the stem-bottom axis. A 6-mm-diameter stainless steel cylindrical probe with a flat end was used. The puncture process was recorded by computer, and the final puncture force was defined as the average of three maximum forces required to push the probe to a depth of 3 mm at a speed of 5 mm s−1.

All the experiments and measurements were carried out at a room temperature of 25 ± 1 °C.

Statistical Analysis and Pattern Recognition Methods

In the present study, different statistical analysis and pattern recognition methods were applied for feature selection and quantitative and qualitative recognition of tomato juice qualities.

Methods used for Feature Selection and Construction of Fusion Datasets

Feature selection and construction of fusion datasets were performed by principal component analysis (PCA), factor F, and stepwise selection. Factor F is the ratio of variances between classes and the sum of internal variance in all classes. A detailed description could be refereed to Ciosek et al. (2004). Stepwise selection is a variable selection approach that tests at each step for variables to be included or excluded. In traditional implementations, variables are chosen to enter or leave the model according to the significance level of an F test. At each step, the model is examined. If the variable in the model produces the least significant F statistic, then that variable is removed. Otherwise, the variable not in the model that yields the most significant F statistic is added. The stepwise selection contains various steps of F test, while the factor F approach only requires one F test. In this paper, for both the factor F and stepwise selection approaches, ST was considered as the response.

Methods used for Discrimination of Juices Squeezed from Fruits with Different ST

Visually exploration and classification of cherry tomato storage time were performed by PCA and canonical discriminant analysis (CDA). Given a classification variable and several quantitative variables, CDA derives canonical variables (linear combinations of the quantitative variables) that summarize between-class variation in much the same way that principal components (PCs) summarize total variation.

Pattern recognition of storage time was performed by a learning vector quantization (LVQ) network and support vector machine (SVM). LVQ is a supervised two-layer classifier which has its classes generated by the self-organizing feature map (SOFM) algorithm (Unay and Gosselin 2006). SVM is a linear machine working in the high-dimensional feature space formed by the nonlinear mapping of the n-dimensional input vector into a K-dimensional feature space (K > n) through the use of a kernel function (Gómez-Sanchis et al. 2013; Wei and Wang 2011). Compared with other feasible kernel functions, radial basis function (RBF) was able to reduce the computational complexity of the training procedure and to give a good performance under general smoothness assumptions. To obtain a good performance, the penalty parameter C and kernel parameter gamma (γ) in the SVM model should be optimized (Brudzewski et al. 2004). The best combination of C and γ is often selected by a grid search with exponentially growing sequences of C and γ, for example, log2 C and log2 (γ) ranging from −10 to10 at an interval of 1 (Szöllősi et al. 2012). Typically, each combination of parameter choices is checked using cross-validation, and the parameters with the best cross-validation accuracy are picked. In the present study, RBF was chosen for the SVM model.

Methods used for Quantitatively Tracking and Prediction of Quality Indexes

Quantitative analysis with respect to the four quality indices was performed using principal components regression (PCR) with forward selection. PCR is actually an employment of PCA on raw independent variables prior to multiple linear regression (MLR). However, PCR can avoid collinearity problem, which often appear when MLR is used.

Partition of Training, Cross-Validation, and Testing Sets

For the first time of experiment, 25 samples were prepared for each measurement per measuring day. Thus, there were in total 125 samples for e-nose, e-tongue, and each quality index measurements, respectively. It is well known that the real classification/predictive ability of any calibration model cannot be judged solely by using internal validation; instead, it has to be validated on the basis of predictions for samples not included in the calibration test (Zhang et al. 2012; Beghi et al. 2013). In the present work, the data used for testing were independent from those used for training. Using the e-nose data, for example, when it comes to LVQ- and SVM-based classification as well as PCR-based regression analysis, the 125 e-nose sample data were divided into two subsets: 60 % of samples were randomly selected as the training set and the remaining 40 % of samples were used for testing. An inner Leave-one-out (LOO) cross-validation (CV) was employed to calibrate the training set, which was then used to test the testing set and a verification dataset obtained from the second time of research. This random-split process was repeated 50 times, and the average classification accuracy as well as prediction error was recorded.

Data obtained from the second time of research were used as an independent verification dataset to verify those classification and prediction models.

Classification performances of LVQ and SVM were evaluated by the percentage of correct classifications (%CC), calculated by the equation:

$$ \%\mathrm{CC}=\frac{CC}{AC}\times 100\% $$
(1)

where CC denotes the number of correctly identified samples in CV sets or testing sets and AC denotes the number of all samples in CV sets or testing sets. The prediction performance of PCR was estimated using the parameters obtained from the fitted equation: the correlation coefficient (R2) and the root mean square error (RMSE) between predicted and experimental values. Generally, the larger the R2 and the lower the RMSE, the better is the prediction model.

The PCA, CDA, and PCR were performed in SAS 8.2 (SAS Institute, Cary, NC, USA), LVQ was performed using the network toolbox in MATLAB R2008a, and SVM was performed using Lib-SVM (Chang and Lin 2011).

Results and Discussion

Changes in youbei cherry tomato quality indices (pH, SSC, VC, and firmness) during storage

Average values (in the form of means ± standard deviation (SD)) of youbei cherry tomato quality indices at different storage times are presented in Table 1. The storage time has significant effect on the value of each quality index (P < 0.0001 for all indices). As the cherry tomatoes ripened, the pH value (mean value) increased slightly from 4.32 to 4.37 during the first 2 days, after which it decreased to 4.21 at day 8; the SSC increased from 5.3 to 5.7 0Brix with a total increase of 7.5 % during the first 2 days, after which it decreased to 4.8 0Brix at day 8; the VC increased from 27.3 to 31.07 mg/100 g, a total increase of 13.81 % during the first 2 days, after which it decreased to 24.49 mg/100 g at day 8; and the firmness decreased from 10.2 to 5.34 N, a total decrease of 47.65 % after 8 days. It is noticeable that the SSC and VC increased relatively quickly during the first 2 days followed by an almost stable state from day 2 to day 4. The short-term increase in SSC and VC may have been due to active metabolic processes, during which cherry tomatoes change from light red to totally red and mature. It is also interesting to note that all four quality indices had a relatively sharp decrease from day 4 to day 6. This is because of the respiratory climacteric, during which the respiration intensity rises sharply and then reduces to slow respiration intensity along with fruit senescence.

Table 1 Changes with storage time in the average values (in the form of means ± standard deviation) of pH, SSC soluble solids content, VC Vitamin C, and firmness of youbei cherry tomatoes

For the second time of research, the average pH, SSC, VC, and firmness values were similar to that in ST0 group of the first time of research.

Response Curves of e-Nose and e-Tongue

Typical e-nose responses of juices squeezed from tomatoes stored for 0 (Fig. 1a) and 8 (Fig. 1b) days are presented in Fig. 1a and b. The x-axis represents time and the y-axis represents the sensors’ ratio of conductance of the e-nose G/G 0 . Each curve represents the change in a sensor’s ratio of conductance during measurement. As is shown in Fig. 1a and b, the conductivity of the ten sensors gradually changed (increased or decreased) and finally reached a stable equilibrium. It is noticeable that the trend in sensor outputs in both figures is the same: The G/G 0 values from sensors S9, S6, S8, S7, S4, and S10 reached a maximum at 70 s, when sensors S1, S3, S5, and S2 reached a minimum. However, compared to Fig. 1a, the G/G 0 value from S9, S6, and S8 in Fig. 1b is relatively higher, indicating that it is possible to investigate cherry tomato juices squeezed from tomatoes with different ST by e-nose. In this paper, the area between each sensor curve and line y = 1 is considered as the original data for each sensor. The calculation equation for area is as follows:

$$ {s}_{area}={\displaystyle {\int}_{t=1}^{70}}\left| f(t)-1\right| dt $$
(2)

where f(t) is the function for a sensor’s responding curve, i.e., f(t) represents the change in G/G 0 value during measurement.

Fig. 1
figure 1

Typical e-nose and e-tongue responses of juices squeezed from youbei cherry tomatoes stored for 0 (a and c) and 8 (b and d) days: a and b are e-nose responses; c and d are e-tongue responses

Typical e-tongue responses from juices squeezed from tomatoes stored for 0 (Fig. 1c) and 8 (Fig. 1d) days are presented in Fig. 1c and d. Each curve represents the corresponding potentiometric difference of a sensor against time (seconds). Again, it is also noticeable that the trend in sensor outputs in both figures is the same: In the first 40 s, the response intensity from sensors BB and JB increased rapidly, while from sensors GA, BA, ZZ, HA, and CA, it decreased slowly. From 40 to 120 s, except for the slowly increasing response intensity from sensors JB and BB, the response intensity from the other five sensors hardly changed. All the sensors’ responses became stable afterwards and finally reached a dynamic equilibrium. However, compared to Fig. 1c, the response intensity from ZZ, JB, and HA in Fig. 1d is relatively higher, indicating it is possible to investigate cherry tomato juices squeezed from tomatoes with different ST by the e-tongue. In this paper, the response values at 120 s from each sensor were extracted and analyzed.

The data vector obtained from the e-nose and e-tongue were all standardized before further analysis. The standardization was defined as the difference between the original responding value of each sensor and the mean value, divided by the standard deviation.

Comparison between e-Nose and e-Tongue Measurements

Intuitively Recognition of Juices Squeezed from Tomatoes with Different ST by PCA and CDA

Plots of the first two PCs and canonical variables (CV1 and CV2) of cherry tomato juices (squeezed from tomatoes with five different ST) data obtained using the e-nose (Fig. 2a, b) and e-tongue (Fig. 2c, d) are shown in Fig. 2, where ST0 to ST8 represent day 0 to day 8, respectively. For the e-nose dataset, data points of the ST2, ST4, and ST8 groups are close to each other in the PCA plot (Fig. 2a). This kind of data structure indicates the possibility of misclassification among these groups in the following classification analysis. However, the juices squeezed from five ST groups could be clearly classified into five groups by CDA (Fig. 2b). This could be explained as follows: though the ST2, ST4, and ST8 groups are close to each other in the 2D PCA plot (Fig. 2a), the three groups may be more separated when more PCs are considered, e.g., if the third PC values are significantly different, they will be more separated from each other. For the e-tongue dataset, data points from the five groups were well discriminated from each other in both the PCA (Fig. 2c) and the CDA (Fig. 2d) plots. Observing the two PCA plots, it is noticeable that data from the verification group is not exactly the same as the ST0 group. This may be explained as follows: the cherry tomato samples from the verification group are a little different from the first batch of cherry tomatoes (even though the four quality indices were similar, the volatile gas emitted from the samples might be a little different). As we know, the sensors consisted in the e-nose and the e-tongue are very sensitive. So the sensors’ responses to the verification group are different from those to the ST0 group from the first time of research. Meanwhile, it should be noted that all four plots showed no obvious tendency concerning the storage shelf life. This may be explained by the complicated metabolic activities of cherry tomatoes during storage. As discussed before, the four quality indices showed different changing trends during storage, e.g., the firmness declined all the time while the other three quality indices values increased first. Thus, the changes in volatile compounds may also not be clearly correlated with storage time. This is also supported by Gómez et al. (2008), in whose work data distributions of tomatoes stored for 3 and 6 days seem irregular.

Fig. 2
figure 2

PCA and CDA plots of juices squeezed from tomatoes with different post-harvest storage times based on e-nose (a and b) and e-tongue (c and d) measurements

The above result indicates that close distribution of data points in a 2D PCA plot does not guarantee that the original data points are close to each other; on the other hand, if data points from different groups are far away from each other in a 2D PCA plot, then the original data points could be well discriminated. Meanwhile, the results presented in the CDA plots are better discriminated than in the PCA plots. This is in agreement with the previous studies (Hong et al. 2012; Beullens et al. 2006). The reason may be that CDA is a supervised approach that can take account of the distribution of data points in the same group as well as the distance between different groups, while PCA is an unsupervised approach that only takes account of the total variances of the data. Thus, for later fusion datasets, only the CDA plots were presented for intuitively recognition of the juices.

Discrimination of Juices Squeezed from Tomatoes with Different ST by LVQ and SVM

Comparative discrimination of cherry tomato juices squeezed from tomatoes with different ST by LVQ and Lib-SVM methods based on the e-nose and e-tongue datasets is presented in Table 2. Since the ultimate objective of classification is to classify unknown data, and a good training model may be too over-fitting to guarantee high testing accuracy. Thus, only the testing and verification sets are analyzed here. In the case of e-nose measurement, the average classification accuracy (%CC) of testing and verification sets obtained by LVQ are 76 and 86 %, respectively. However, when Lib-SVM is applied, the average %CC of testing and verification sets raise to 89.48 and 96.8 %, respectively. It is noticeable that the average %CC of verification set is higher than it of testing set. This may be explained as follows: as observed in Fig. 2a, the e-nose data points from ST2, ST4, and ST8 groups are close to each other. The testing set includes data samples from the ST2, ST4, and ST8 groups, while the verification set is consisted of juice samples squeezed from tomatoes stored for 0 day. In the case of e-tongue measurement, no matter which classifier is applied, the average %CC of testing and verification sets are all higher than 96 %. In general, the classifier succeeds to identify data points from the second time of research. Though the sensors’ responses to the verification group are not exactly the same as those to the ST0 group from the first time of research, the verification group is most similar to the ST0 group. Thus, it could be well classified into the ST0 group. Meanwhile, the performances of Lib-SVM are better than LVQ in this study. Previous studies have also demonstrated the superiority of SVM (Pan et al. 2012; Jodas et al. 2013). Thus, for further analysis, only Lib-SVM was chosen for classification.

Table 2 Discrimination of juices squeezed from tomatoes with different post-harvest storage times by LVQ and Lib-SVM methods based on e-nose and e-tongue measurements

PCR Results

The same as in the case of classification, only the testing and verification results are analyzed in the case of regression. Testing and verification of quality index regression models trained by e-nose and e-tongue measurements using PCR are listed in Table 3. Root mean square error (RMSE) is employed to evaluate prediction performances. For regression models trained by the e-nose dataset, the testing RMSE for pH, SSC, VC, and firmness is 0.025, 0.146 0Brix, 0.782 N, and 1.386 mg/100 g, respectively. Considering SD of the quality indices values (the overall SD for pH, SSC, VC, and firmness are 0.061, 0.347, 2.695, and 1.900, respectively), the RMSE values are generally acceptable. For regression models trained by the e-tongue dataset, the testing RMSE for pH, SSC, VC, and firmness are 0.015, 0.100 0Brix, 0.623 N, and 0.755 mg/100 g, respectively. The small RMSE values demonstrate good regression models based on the e-tongue dataset. Although it may seem surprising to see that the firmness of the tomato skin, a mechanical measure, could be sensed by either e-nose or e-tongue, and that the three non-volatile quality indices (pH, SSC, and VC) could be sensed by e-nose, such results are meaningful since senescence of tomatoes during storage is a complex process involving changes in the composition of both soluble and volatile compounds. These changes are correlated with the changes in the four quality indices thus allowing their prediction using e-nose and e-tongue. In other words, the e-nose or e-tongue does not measure the four quality indexes directly; it actually measures volatiles or other soluble compounds that are well correlated with the quality indices.

Table 3 Testing and verification of quality index regression models trained by different datasets

However, prediction results for data obtained from the second time of research are not so good, suggesting the regression models based on e-nose dataset are lack of generalization and robustness. In the case of e-nose dataset, the verification RMSE for pH, SSC, VC, and firmness are 0.099, 0.659 0Brix, 3.954 N, and 1.546 mg/100 g, respectively. In the case of e-tongue dataset, the verification RMSE for pH, SSC, VC, and firmness are 0.056, 0.157 0Brix, 2.259 N, and 0.846 mg/100 g, respectively. Except the SSC and VC regression models, the other two regression models produce RMSE values larger than one half of the SD values. The regression models based on the single usage of e-nose or e-tongue generally fail to correctly predict quality indices. This may be because the verification dataset was obtained from the second time of research. The cherry tomato samples are a little different from the first batch of cherry tomatoes (even though the four quality indices were similar, the volatile gas emitted from the samples might be a little different). As we know, the sensors consisted in the e-nose and the e-tongue are very sensitive. So the sensors’ responses to the verification group are different from those to the ST0 group from the first time of research.

In general, the performances based on the e-tongue dataset are better than those based on the e-nose dataset. Meanwhile, in view of the results originating from the former classification methods (PCA, CDA, LVQ, and SVM) as well as the quantitative analysis method (PCR), the ability of the e-tongue for qualitative and quantitative analysis is relatively better than the e-nose. This may be due to the fact that during measurement of a liquid sample, the e-tongue electrodes are immersed in the sample, consequently, it is more sensitive to changes in quality indices. A previous study also indicated that e-tongue contributed more than e-nose when detecting liquid samples (Cole et al. 2011). However, some researchers found the reverse (Cosio et al. 2007). Hence, it is hard to conclude which of the two techniques is better. The answer to this question is depended on the characteristics of the detected samples as well as the sensors contained in the two systems.

In the following sections, we will discuss whether the simultaneous utilization of perceptual knowledge from both instruments will increase the extent of sample information.

Data Fusion of e-Nose and e-Tongue

Feature Extraction and Selection by PCA, Factor F and Stepwise Selection

When e-nose and e-tongue are combined, there are 17 original variables (ten e-nose sensors and seven e-tongue sensors) in total for each sample, that is, a 17 × 125 (25 replications × 5 groups) data matrix. To explore the correlation between these 17 variables as well as to avoid the ‘curse of dimensionality’, six fusion approaches were discussed (Table 4), and PCA, factor F, and stepwise selection were employed to select variables for building the fusion datasets.

Table 4 Descriptions of fusion approaches

As described in Table 4, fusion approach 1 is based on a simple concatenation of standardized e-nose and e-tongue datasets; fusion approach 2 is based on stepwise selection. The selection of variables by stepwise selection starts with the largest classification weight (the Fisher weight). During the process of stepwise selection, a variable is entered into the model if the significance level of its F value is less than 0.05 and is removed if the significance level is greater than 0.1; fusion approaches 3 to 5 are based on factor F. Calculations of factor F for each sensor variable are shown in Fig. 3, and the sensors displaying the highest values of F were chosen to create a reduced sensor array. This has been reported by Ciosek et al. (2004). In this study, three threshold log F values (3, 2.5, and 2) were set, and the sensors with log F values higher than 3, 2.5, and 2 were selected for construction of fusion approaches 3 to 5, respectively; fusion approach 6 is based on the first three PCs (total contribution rate are higher than 90 %) respectively generating from e-nose and e-tongue.

Fig. 3
figure 3

Plot of factor F values calculated for e-nose and e-tongue sensors in the discrimination of juices squeezed from tomatoes with different post-harvest storage times

Qualitative and Quantitative Performances of Fusion Approaches Based on six Features (Original Variables, PCA, Factor F and Stepwise Selected Variables)

The CDA was again used for intuitively discrimination of the cherry tomato juices (squeezed from tomatoes with five different ST) data obtained from fusion approaches 1 to 6. Plots of CV1 and CV2 based on the six fusion approaches are presented in Fig. 4, where the intra class variability is very small while the inter class variability is very large. This may be due to the fact that the data points in a same class were derived from juice samples with the same storage time, while the data points in different classes were derived from juice samples with different storage time. As observed from Table 1, storage time has a significant effect on the quality of the cherry tomatoes. The e-nose and e-tongue sensors were sensitive to these changes; thus, the intra class variability would be small and the inter class variability would be large. No matter which fusion approach is used, the five storage times are clearly separated from one another, except for a close data distribution between day 0 and day 4 samples based on fusion approach 3 (Fig. 4c).

Fig. 4
figure 4

CDA results for discrimination of juices squeezed from tomatoes with different post-harvest storage times based on a fusion approach 1, b fusion approach 2, c fusion approach 3, d fusion approach 4, e fusion approach 5, and f fusion approach 6

Data structure (partitions of datasets as well as the choice of cross-validation) of the six approaches for Lib-SVM and PCR analysis is the same as previously discussed. Classification of cherry tomato juices squeezed from tomatoes with different ST using Lib-SVM is presented in Table 5, where testing and verification rates are the average classification accuracy for the testing and verification sets, respectively. Observed from Table 5, all six fusion approaches obtained high accuracy (higher than 92 %) in both testing and verification sets, with fusion approach 2 presenting the highest classification accuracy. Compare the fusion datasets and individual datasets, it is noted that sensor fusions result in better classifiers than the classifier trained by individual e-nose dataset. However, except fusion approach 2, classification accuracy of the other fusion approaches is slightly lower than that of the e-tongue dataset. This suggests that sensor fusion isn’t always better than each of the individual technique. Data fusion and feature reduction might increase unnecessary variables and reduce useful variables (e.g., variables that are highly correlated with the group labels). Thus, sensor fusion could be better than individual utilization only if proper fusion approaches are used.

Table 5 Use of Lib-SVM based on different datasets for discrimination of juices squeezed from tomatoes with different post-harvest storage times

Testing and verification of quality index regression models trained by PCR based on the six fusion approaches datasets are listed in Table 3, where all the fusion approaches present good testing results. The testing RMSE for pH, SSC, VC, and firmness based on these fusion approaches are smaller than based on e-nose dataset. However, different fusion approaches produce different RMSE. For the testing of pH regression models, the model built based on fusion approach 2 is better than based on e-tongue, while models built based on fusion approaches 3, 4, and 6 are worse. For the testing of SSC regression models, the models built based on fusion approaches 2, 1, and 4 are better than based on e-tongue, while models built based on fusion approaches 3, 6, and 5 are worse. For the testing of firmness regression models, the models built based on fusion approaches 2, 6, 1, and 5 are better than based on e-tongue, while models built based on fusion approaches 4 and 3 are worse. For the testing of VC regression models, the model built based on fusion approach 5 is better than based on e-tongue, while models built based on fusion approaches 3, 4, 1, 2, and 6 are worse. In general, for the prediction of pH, SSC, and firmness using the testing set, fusion approach 2 presents the smallest prediction error (RMSE for pH, SSC, and firmness are 0.014, 0.078 0Brix, and 0.567 N, respectively). For the prediction of VC using the testing set, fusion approach 5 presents the lowest prediction error (RMSE for VC is 0.753 mg/100 g). Compare fusion dataset 2 with dataset 5, it is noticeable that both datasets contain all seven e-tongue sensors and several e-nose sensors. After eliminating the shared variables, fusion dataset 2 includes sensors S1, S4, and S5, while fusion dataset 5 includes sensors S3, S9, and S10. This might suggest that sensors S3, S9, and S10 are more correlated with the VC values, while the changing trends of sensors S1, S4, and S5 might be more correlated with the pH, SSC, and firmness indices.

Prediction results for the verification dataset are interesting. Except in the case of verifying VC regression models, quality regression models built by fusion datasets present lower prediction error than those built by individual e-nose or e-tongue datasets, suggesting the combination of variables in the fusion datasets are more correlated with the quality indices. For the verification of pH regression models, only the models built based on fusion approaches 5, 2, and 1 succeed to predict pH value for the verification dataset. For the verification of SSC regression models, the models built based on the six fusion approaches all succeed to predict SSC value for the verification dataset. For the verification of firmness regression models, only the models built based on fusion approaches 2 and 1 succeed to predict firmness value for the verification dataset. For the verification of VC regression models, only the models built based on fusion approaches 3 and 5 succeed to predict firmness value for the verification dataset.

Comparing the qualitative and quantitative performances based on the eight datasets, it is noticeable that all eight datasets produced good classification performances with e-tongue and fusion approach 2 presented the best. However, as for the prediction of quality indices, the performances based on different datasets differ a lot. The eight datasets can be sorted into descending order according to verification accuracy for regression models. For the verification of pH regression models, the sequence is fusion approaches 5, 2, 1, 4, 6, 3, e-tongue, and e-nose. For the verification of SSC regression models, the sequence is fusion approaches 1, 5, 2, 4, 6, 3, e-tongue, and e-nose. For the verification of VC regression models, the sequence is fusion approaches 2; e-tongue; fusion approaches 5, 4; e-nose; fusion approaches 6, 1, and 3. For the verification of firmness regression models, the sequence is fusion approaches 2, 1, 3, 4, 5, 6, e-tongue, and e-nose. Some previous studies found that simple concatenation of sensors was better than individual utilization (Cole et al. 2011; Tudu et al. 2012), while some found that sensor fusion was not necessary (Cosio et al. 2007). This research indicates that sensor fusion is not always better than individual utilization: Feature selection approaches used for building fusion datasets are what matter the most.

Conclusions

In this paper, eight datasets, extracted from an e-nose and an e-tongue, and six sensor fusion approaches using both the instruments, were applied to detect 100 % fresh juices squeezed from tomatoes with different post-harvest storage time. The results indicate that it is potential to track fruit ST/freshness through the detection of the squeezed fruit juice by e-nose and e-tongue. Specific discrimination and prediction of juice with different quality based on different datasets are given as follows:

  1. 1.

    The discrimination abilities of CDA and Lib-SVM are found to be better than those of LVQ in this study. No matter which of the eight datasets was applied, cherry tomato juices squeezed from tomatoes with different post-harvest storage times could be well discriminated from each other by CDA and Lib-SVM.

  2. 2.

    Qualitative and quantitative analyses based on the individual utilization of the e-tongue were found to be better than that based on use of e-nose alone in this study. However, quality regression models trained by either e-nose or e-tongue dataset were not robustness, i.e., they failed to predict quality indices for a totally new juice sample.

  3. 3.

    Sensor fusion makes it possible to build more robust prediction models. However, classification and regression performances based on different datasets differed. Sensor fusion is not always better than individual utilization: Feature selection approaches used for building fusion datasets are what matter the most. In other words, simultaneous utilization of perceptual knowledge from both instruments could guarantee a better performance than individual use of the e-nose or e-tongue only if proper feature selection and data fusion methods are used.