Introduction

Apple production is increasing steadily and reached approximately 89 million tons in 2016, indicating that apple plays a significant role in the worldwide market [1]. The eating qualities of apples are profoundly concerned by consumers, including external properties and internal properties. External properties in terms of color, size and defect have been researched and realized. Internal properties have gained more attention because of the difficulty in non-destructive determination. Soluble solids content (SSC) is one of the most concerned properties [2], which describes the sweet flavor of fruits. Due to huge demand of market, on-line grading machines are utilized to select apples automatically based on required qualities, which can add commercial value of apples and provide consumers with more choices according to their preference [3]. With the advantages of being safe, rapid, non-destructive and environmentally friendly, near-infrared (NIR) spectroscopy technology is an effective tool for determining internal qualities of fruits in on-line grading systems [4,5,6,7,8].

However, the whole process of NIR detection involves plenty of influencing factors to deteriorate evaluation performance. First of all, influences of samples from different orchards, seasons, cultivars, districts and varieties on NIR prediction models can not be ignored [9,10,11,12,13,14]. Taking these biological variations into account, appropriate samples are supposed to be selected. What’s more, the inhomogeneity of SSC distribution in each sample leads to the difficulty of choosing representative areas to detect [15]. It was suggested by Slaughter et al. [16] that the fruit should be oriented to allow measurement at its equator, aiming at better prediction accuracy and robustness. Nevertheless, when confronting with high-speed on-line SSC determination, there are more influencing factors occurring in the dynamic process compared to the steady state in laboratories. In this case, bicone-roller conveyor was used as a transportation tool for delivering apples on account of its universality in both internal and external qualities evaluation systems [17]. The most apparent obstacle brought is that measurement positions of apples are totally random for erratically movement during conveying, which results in the differences of measurement area and light path at each time. Therefore, from a practical perspective, it attaches great necessity to clarify the influence of apple’s measurement position on NIR spectra and corresponding prediction models for SSC determination.

In the present work, an on-line NIR detection system based on bicone-roller conveyor was built for SSC prediction of apples. The objectives of this study were (1) to evaluate the influences of apples’ measurement positions on spectra and SSC prediction models; (2) to establish the optimal prediction model after the investigations as noted above; (3) to provide new insights on compensation methods for these influences.

Materials and methods

Sample preparation

‘Fuji’ apple (Malus x domestica Borkh. cv. ‘Red Fuji’) has been selected in this research with merits of attractive eating flavor and high economic return [18]. Apples from the same origin were purchased in a fruit market in China, which were harvested in October of 2017. After manual selection of apples without obvious defects, 150 samples whose equatorial diameters ranged between 75 and 85 mm were carefully cleaned and numbered. Before spectral measurement, these samples were stored in the experimental circumstances with temperature of 10 °C and relative humidity of 60% for 12 h to equilibrate [19].

Spectral measurement

An on-line NIR system was established for spectral measurement, mainly consisting of three parts (Fig. 1). The first part was optical configuration, including a visible and shortwave near-infrared (VIS-SWNIR) charge coupled device (CCD) spectrometer (Model QE65000, Ocean Optics Inc., USA), two tungsten halogen lamps (150 W, 24 V, Ushio Inc., Japan) with collimating lenses and an optical fiber with probe. In this interactance mode, probes were placed at the half height of apple. Light beams are focused by the angle design of lamps and collimating lenses with the aim of utilizing energy adequately. This is the most crucial part which can interpret the internal information of apples into spectra. The second part was transportation device that will transport apples during on-line determination. In this case, bicone-roller was adopted as a conveyor on account of its versatility in on-line detection for both internal and external quality of fruits [17]. Taking efficiency into account, the velocity of conveyor was three apples per second and consequently the integration time was 100 ms for each apple. The movement direction was perpendicular to paper, from outer side to inner side when observing at the front perspective. The last part was data-processing section in which a computer program was employed as an effective tool for collecting and recording spectra.

Fig. 1
figure 1

Schematic diagrams of on-line detection system based on bicone-roller transportation. (1) Light source with collimating lens; (2) apple; (3) bicone-roller conveyor; (4) optical fiber

In the interest of influences caused by various measurement positions, the spectra were respectively obtained in six fixed positions and one random position as illustrated in Fig. 2. The six fixed positions contained measurement positions allowing the stem-calyx axes of apples perpendicular or parallel to the movement direction in various six ways with the help of manual loading. Meanwhile, the random position was totally spontaneous after rolling movement, which was extraordinarily typical in the practical application. Except for the random position, other positions had correspondingly symmetrical positions when their stem-calyx axes were in the same direction, for instance position 1 and position 2; position 3 and position 4; position 5 and position 6.

Fig. 2
figure 2

Seven measurement positions during spectral collection

After the NIR system got stable, dark spectrum and reference spectrum were collected. For each category of measurement position, three spectra were collected and averaged to be analyzed in case of random errors. At last, there were 1050 pieces of spectra being recorded in sum and 600 average spectra being calculated for seven measurement positions. Additionally, the total average spectra of all measurement positions were also computed for comparison purpose. These spectra will be converted into absorbance by the computer program for the following analysis.

SSC measurement

As a reference, the real SSC values of apples were analyzed by a digital refractometer (model PR-101, Atago Co., Ltd., Tokyo, Japan) in a destructive way. Apple juice was squeezed and then SSC value was detected and recorded. The average value of three tests was used for each apple in the following analysis, to better represent the true quality of apple [15].

Chemometrics and data analysis

Chemometrics analysis was conducted by TQ Analyst v6.2.1 (Thermo Nicolet Corporation, Madison, WI, USA) in order to correlate spectral information with SSC concentration. Since the NIR spectra of apples have overlapped spectral features, partial least squares (PLS) was adopted as a statistical approach to quantify SSC of samples [20]. As a powerful multivariate analytical method, PLS method has been proved to be useful in extracting features from spectra by taking both variable matrix X (spectral data) and variable matrix Y (the property of interest) into consideration into consideration. Therefore, it is widely used in the application of spectroscopy technology in food industry. For preprocessing treatment, the data normalization of spectra was automatically carried out by TQ Analyst, which was based on mean centering technique. Spectrum outlier would be dismissed automatically too for each model, based on Mahalanobis distance. When developing prediction models, Model performance was described by the statistical terms of root mean square error of calibration (RMSEC), correlation coefficient of calibration (rc), root mean square error of validation (RMSEP), correlation coefficient of validation (rp), root mean square error of cross-validation (RMSECV) and correlation coefficient of cross-validation (rcv). Lower RMSEC and RMSEP while maintaining higher rc and rp stand for better accuracy of models. With regard to robustness for models, the difference between RMSEC and RMSEP should be as close as possible [21,22,23].

In the analytical part, cross-validation was conducted using spectra acquired at each measurement position. RMSECV and rcv were compared to illustrated the influences that measurement positions made on prediction models. In addition, to better understand the relations between different measurement positions, spectra acquired at one particular position was used as calibration set while adopting those acquired at another position as a prediction set to establish prediction model. When developing optimal model with only one data collection, samples were randomly distributed into calibration set and prediction set (3:1) to be analyzed.

Results and discussion

Statistics of samples

The values of SSC ranged from 10.2 to 16.3% and the standard deviation (SD) was 1.05%. The SD value took up about 26% in the range of difference, which suggests that the variation of SSC values has been adequate to be analyzed [24].

Spectral analysis and comparison

Figure 3 shows the average absorbance spectra of 150 apples at different measurement positions. The total average spectrum of these seven measurement positions was also included for comparison purposes, which was labeled by black straight line. Obvious spectral signatures occurred in the range between 400 and 1000 nm, which constituted a major part in the following evaluation. For different measurement positions, similar peaks can be observed at the wavelengths of 530 nm, 670 nm, 760 nm, 830 nm and 980 nm. There were nearly overlapping regions at the wavelength ranges of 320–400 nm and 670–690 nm regardless of position 6. The most obvious difference of spectra acquired at different positions occurred at the spectral peaks at 530 nm, 760 nm and 980 nm. The peak at the wavelength of 670 nm is related to the absorption of chlorophyll in peel around 660 nm [25]. Wavelengths at 760 nm and 980 nm are associated with the absorption of water and carbohydrate in apples [20, 23]. Moreover, the difference mainly manifested in the intensity of spectra acquired at seven positions. However, the symmetrical positions appeared to be very close in intensity except for position 5 and position 6. The absorbance of position 1 and position 2 were highest while that of position 5 and position 6 were lowest. Compared with the total average spectrum, five spectra were stronger than average when two spectra were weaker than average in intensity. The most similar spectrum to the average one was obtained at the random measurement position (position 7) indicating that the acquisition method of the random position was likely to collect adequately useful information for improving the performance of SSC prediction models. Apart from that, spectra acquired at position 3 and position 4 were very close to that acquired at the random position, which can be regarded as an positive symbol for good calibration models. Average spectrum at position 6 was far lower than the total average one. It might be attributed to the concave shape of calyx blocking amount of reflecting and scattering light. According to these spectral characteristics, the wavelength of 450–1030 nm was selected in the PLS models considering signal–noise radio.

Fig. 3
figure 3

Average absorbance spectra acquired at seven measurement positions and their average absorbance spectrum

Results and comparison of PLS models

Tables 1 and 2 summarize the calibration and prediction results for SSC determination of apples. Spectra acquired at each position were employed as calibration set respectively while varying those acquired at other positions as validation sets. Model performance of the particular measurement position was evaluated by cross-validation, which were labeled by dotted line. The global model contained spectra from all measurement positions, which has huge spectral quantity. Therefore, it was only applied as a calibration set.

Table 1 Root mean square error of prediction (RMSEP) for SSC when using a calibration model of a certain position for predicting the other positions
Table 2 Correlative coefficient of prediction (rp) for SSC when using a calibration model of a certain position for predicting the other positions

As it can be seen from tables, models internally validated within the same measurement position had superior performance compared to validations by other measurement positions apart from its symmetrical position (results labeled by dash dot line). In addition, prediction models established at symmetrical measurement positions appeared to be similar in RMSEP and rp except for position 5 and position 6, which coincided with the previously observation in the spectral comparison. However, due to the geometrical difference between calyx and stem ends of apples, the light paths in position 5 and position 6 were not necessarily similar while other symmetrical positions did. A possible deduction can be drawn that the similarity in light paths caused by symmetrical measurement positions would lead to similar prediction performance.

It can also be observed that the optimal cross-validation model with the lowest RMSECV and highest rcv was established with average spectra at seven measurement positions. Both average spectra of seven positions and global spectral collection contained information from all seven positions to some extent. However, it turned out that better prediction models would be obtained when adopting global data as a calibration set or adopting average spectra as a prediction set. It suggested that global calibration models would make a contribution to improving robustness and compensating for variations while local calibration models can achieve better accuracy when predicting with the similar population, which was also mentioned in other studies [9, 19, 26].

What’s more, cross-validation models established at position 3 and position 4 as well as the global case also displayed a promising performance while those established at position 5 and position 6 had the worst performance. This phenomenon was reasonable for the reason that there was no interference between calyx-stem axis and light path in occasions of position 3 and position 4, which enabled both probe and light source to focus on the equatorial area of apples. It was once reported that superior model performance would be expected when detecting at this area [15]. The models acquired at position 5 and position 6 showed inferior performances because the calyx-stem axis of apples overlapped with the optical axis of probes at these two positions. Furthermore, the calyx area involved a larger concave surface than the stem area, which accounted for more serious light decrease by scattering and reflecting. Hence, the worst cross-validation model was obtained at position 6 other than position 5.

In order to better illustrate the influences made by measurement positions on model performance of SSC determination, 1st and 2nd PLS scores in global model were calculated and presented in a two-dimensional figure (Fig. 4). The distance between points indicated the similarity of samples taking both spectral features and SSC reference values into account. The symmetrical positions were labeled by the same color but different symbols. It can be observed that symmetrical positions have been clustered as a group. The most central points represented spectra from position 3 and position 4, which corresponded to the optimal results of SSC determination. The most obvious deviation belongs to spectra acquired at position 6. Therefore, these observations accorded well with the comparison of model performance as previously mentioned.

Fig. 4
figure 4

PLS score plots for spectra obtained at seven measurement positions

Optimal model development and evaluation

As noted above, the models established with average spectra achieved better performance in terms of accuracy while those established with global spectra achieved better performance in terms of robustness. Therefore, with the aim of building an optimal model balancing both accuracy and robustness, both collections of total average spectra and global spectra were used to obtain calibration and prediction models for SSC determination. The scatter plots of SSC models are displayed in Fig. 5. It can be concluded that the model in terms of optimal accuracy and robustness was yielded by average spectra (RMSEC 0.356%; rc 0.947; RMSEP 0.370%; rp 0.906), which was displayed in Fig. 5a. The improvement of this model may be due to the elimination of accidental errors in different spectra by averaging. Besides, the model yielded by global spectra had slightly inferior statistical results (RMSEC 0.488%; rc 0.893; RMSEP 0.506%; rp 0.851), which was displayed in Fig. 5b. However, considering the spectra quantity of global model, the results here was quite satisfying comparing to other works in on-line SSC measurement for apples [7] and also showed great robustness.

Fig. 5
figure 5

Scatter plots of SSC calibration and prediction models for apples using average spectra (a) and global spectra (b)

Suggestions on compensation methods

From the above, it attaches great significance to compensating for this influence caused by variation in sample measurement positions while on-line SSC determination. Several compensating methods were proposed here during the process of spectral acquisition. Firstly, abundant spectra at different measurement positions can be acquired and averaged so as to obtain representative information of each apple. Secondly, when with only interest of specific measurement position, local calibration and prediction model should be established. Thirdly, acquiring spectra at random position was an easy and universal way to collect information with moderate accuracy and robustness of models. Last but not least, the measurement position 6 and position 7 need to be avoided if possible no matter for calibration or prediction purposes. These suggested methods can be flexibly applied according to practical circumstances.

Conclusions

In the present work, an on-line NIR detection system was established based on bicone-roller transportation. Given that apple positions are uncertain after rolling, seven measurement positions including six fixed positions and a random position were compared. Their influences on the prediction models were demonstrated. It turned out that better prediction results can be obtained at the position 3 and position 4 among these seven positions, where both probe and light sources faced at the equatorial area of apple. The interferences by concave surface of apple with the light path would do harm to model’s robustness and accuracy. Results also indicated that symmetrical measurement positions would yield similar spectral information and model performance. What’s more, calibration using average spectra of seven positions was proved to have more positive effects on model performance, compared to the global model using spectra from all measurement positions. The optimal model (RMSEC 0.356%; rc 0.947; RMSEP 0.370%; rp 0.906) was established using average spectra of seven measurement positions. The results are superior compared to other previous studies in the area of on-line NIR measurement for SSC of apples. Last but not least, it was recommended to control measurement positions of samples or to choose an appropriate compensation method while on-line SSC determination, which is helpful in practical application. This work not only revealed the influence of measurement position by the support of experimental results but also worked on compensation methods, which suggested potential improvement in the on-line SSC determination for apples based on roller transportation.