Introduction

The apple enjoys great popularity around the world for its rich mineral and vitamin contents. In addition, its sweet and refreshing taste is another essential attraction, which is one of the factors that consumers most often take into consideration. With the improvement of living standards, there has been an emerging focus on the quality of fruits. Therefore, to meet the developing requirements of the fruit market, the grading of apples can add to its commercial value and provide consumers with diverse choices. On-line detection of the internal and external qualities of apples is the first step to realize industrial grading with high efficiency and automation, due to limited time and human resources. While external quality can be observed by the naked eye, there is more interest in internal quality from the perspective of the consumers.

With the advantages of being rapid, safe, non-destructive, and environmentally friendly, near-infrared (NIR) spectroscopy technology has been widely applied as an analytical tool for fruit detection in terms of internal quality. Soluble solids content (SSC) is one of the most important internal attributes that influence the purchasing decision of the consumers (Ariana and Lu 2002). Many studies have proved the ability of NIR spectroscopy to determine the SSC values of fruits, including apple (Vega et al. 2014; Kumar et al. 2015; Mendoza et al. 2014; Luo et al. 2018), pear (Xu et al. 2012; Sun 2009; Wang et al. 2017), orange (Wang et al. 2014; Liu et al. 2010), kiwifruit (Schaare and Fraser 2000; Mcglone et al. 2002; Liu et al. 2017), strawberry (Amodio et al. 2017), mango (Neto et al. 2017), and cherry (Escribano et al. 2017), which have all yield satisfactory results.

What’s more, there are also many studies about on-line SSC determination using NIR spectroscopy (Xie et al. 2016), which provided theoretical and statistical foundations for the application of fruit grading machines. For example, Golic and Walsh (2006) once used NIR spectroscopy to assess the internal qualities of stone fruits, such as peaches, nectarines, and plums. The SSC calibration model performance was encouraging (determination coefficient of cross-validation, \( {R}_{CV}^2 \) > 0.88; root mean square error of cross-validation, RMSECV 0.53–0.88%), which proved the feasibility of using on-line NIR spectroscopy to realize non-invasive SSC determination of such fruits. Similarly, Paz et al. (2008) used on-line NIR spectroscopy to establish SSC calibration models with 720 intact plums of different varieties. Standard error of cross-validation (SECV) and \( {R}_{CV}^2 \) were 0.77% and 0.83 for SSC, which suggested that fruits can be classified in terms of SSC with the help of NIR spectroscopy. Furthermore, Fu et al. (2012) proved the influence of fruit shape on the NIR spectra and the modeling results using an on-line NIR detection system for oranges. The calibration model with spherical shapes yielded the best prediction results, which has the lowest root mean square error of prediction (RMSEP) of 0.56–0.63%. Apart from these small round fruits, Jie et al. (2014) developed an on-line detection system to detect the SSC of watermelons using diffuse transmittance Vis/NIR spectroscopy. The best model had a correlation coefficient of prediction (rp) at 0.70 and RMSEP at 0.33%. As illustrated above, both the market and technology provide great opportunity and mature feasibility for developing the on-line grading of apples.

On the other hand, it is worth mentioning that the bicone roller is usually utilized to transport fruits instead of trays in large-scale industrial grading systems, which can be universally used in both internal and external quality detection systems (Xu 2003). However, when it comes to on-line detection, there are more complex factors in the dynamic transportation process. For example, the bicone roller transportation will lead to random apple positions after rolling, which leads to random detection regions on the apples when assessing internal quality using NIR spectroscopy. Furthermore, the external structure and internal tissue of each apple are not homogeneous, such as color (Guo et al. 2016), size (Vaudelle and L’Huillier 2015), and SSC distribution (Fan et al. 2016). Studies (Mo et al. 2017; Ma et al. 2018; Fan 2016) have showed the inhomogeneous SSC distribution in apples spatially using NIR hyperspectral imaging technology. In order to know its influence on SSC prediction models based on NIR spectroscopy, Fan et al. (2016) compared the SSC prediction models established with different groups of NIR spectra which were collected around stem, equator, and calyx positions for each apple. They found that the spectrum measurement position affected the prediction accuracy of SSC. Therefore, single-region detection may not acquire enough representative information about SSC in apples, especially in the interactance or reflectance detection mode. It can be consequently inferred that the random detection region at each time would deteriorate the accuracy and robustness of SSC determination models. At the same time, the surface scanning measurement method by Schmutzler and Huck (2014) proved better results than those of the point measurement method. However, it was too time-consuming for surface scanning (54 s per apple), which is not suitable for high-speed on-line grading. Thus, it was put forward that detecting double regions of the apple at the same time may improve the model performance. To test this hypothesis, a binary-branch optical fiber was proposed to simultaneously detect light emitted from two regions of the apple at opposite sides.

In this paper, “Fuji” apple (Malus × domestica Borkh. cv. “Red Fuji”) was chosen as the subject, since it has earned considerable attention because of its delightful flavor and high economic return (Guo et al. 2016). An on-line NIR detection system based on the roller transportation (Xu et al. 2018) was used for the determination of the SSC of apples. The objectives of this study were (1) to compare four configurations differing in detection positions and detection regions, (2) to evaluate the influences of detection positions and double detection regions of the apple on SSC prediction models, and (3) to obtain the optimal prediction model based on the above investigations.

Materials and Methods

Preparation of Apple Samples

“Fuji” apple (Malus × domestica Borkh. cv. “Red Fuji”) was used in this work. The samples were harvested in October 2017, Shanxi, China, and purchased at a fruit trading market in Hangzhou, China. In this work, a total of 150 samples without any obvious bruises or defects were manually selected from more than 200 apples. Then, each sample was carefully cleaned and numbered in preparation for the spectral acquisition. To minimize the variation of size, the equatorial diameters of the selected samples ranged from 75 to 85 mm. After being purchased from the market, this batch of apples was stably stored under the same circumstances as that for spectra acquisition for 12 h to equilibrate, where the temperature was 10 °C and the relative humidity was 60% constantly (Yao et al. 2013).

Spectral Measurement

The on-line NIR grading system (Figs. 1 and 2) used here was based on the work of Xu et al. (2018), which was composed of a NIR detection section, a transportation section, and a grading operation section as well as a calculation and control section. The light sources were two 150-W tungsten halogen lamps and the integration time was 100 ms. The Vis/NIR spectrometer had the wavelength range of 200–1100 nm and wavelength range finally used in further analysis was 350–1030 nm. In terms of efficiency, moving speed was three apples per second. Other configuration details can be found in the previous work of Xu et al. (2018). However, it is worth mentioning that binary-branch optical fiber was also adopted in this work (Fig. 2) to compare the influence of single and double detection regions. The diameter of the mono-branch optical fiber was 1000 μm, and that of the binary-branch optical fiber was 500 μm for each branch. In addition, other configurations and following procedures maintained the same parameters to facilitate comparison. With respect to the calculation and control section, a self-developed program was utilized to collect and record the NIR spectra on the computer with the help of optical triggers and control unit. Optical triggers were used in this system to trigger the acquisition of spectrum. The trigger was symmetrically placed at two sides of the rollers, under the detectors. The bottom part of each roller was designed to match the setting of triggers to acquire one spectrum at each roller. However, only the spectrum with appropriate intensity can be recorded, since the roller without apple on it would not be recorded in this way. The minimum threshold was determined by the spectrum of none-loaded roller and the maximum threshold was determined by the reference spectrum. A two-dimensional (2 × 1044) array containing both wavelength and corresponding absorbance was recorded for each apple to represent its spectrum. The absorbance was automatically calculated by a computer program according to the intensity of the spectrum.

Fig. 1
figure 1

Workflow of on-line grading system

Fig. 2
figure 2

Schematic diagrams of on-line near-infrared detection section with binary-branch optical fiber. (1) Light source with collimating lens, (2) apple, (3) bicone roller transportation, (4) binary-branch optical fiber, (5) control unit, (6)spectrometer, (7) computer (arrows indicate the direction of movement)

As a method of preparation, the dark spectrum and reference spectrum were collected before sample spectral collection after the light source was thoroughly stabilized. Apart from the double detection regions, there are two different detection positions of the apple to be studied, including the specific position and the random position (Fig. 3). The specific position was to artificially place the stem-calyx axis of the apple horizontally and along the direction of movement, which enabled both the detectors and light sources to focus on the equator of the apple. With respect to the random position, the apple’s position was totally random after rotation in the roller transportation. For each kind of position, three spectra were collected and averaged into one for analysis. A total of 1800 spectra were collected this way, and 600 averaged spectra in four configurations were obtained.

Fig. 3
figure 3

Two positions of detection during spectral measurement including specific position (a) and random position (b) (arrows indicate the direction of movement)

SSC Measurement

As a reference, the real SSC values of the apples were analyzed by a digital refractometer (model PR-101, Atago Co., Ltd., Tokyo, Japan) in a destructive way. Three slices (about 1 cm thick) equally distributed at the equator of each apple were cut off and then squeezed into juice. The SSC value was detected and recorded for each slice. The average value of three slices was used for each apple in the following analysis to better represent the true quality of the apple. (Fan et al. 2016).

Chemometrics and Data Analysis

After acquiring the NIR spectra and SSC reference values, analysis was conducted by TQ Analyst v6.2.1 (Thermo Nicolet Corporation, Madison, WI, USA) to establish the correlation between the spectra and sample molecular concentration (Jamshidi et al. 2014; Liu et al. 2008). As one of the most common modeling algorithms, the stepwise multiple linear regression (SMLR) technique was adopted for quantitative analysis, which interpreted concentration as a function of absorbance at multiple wavelengths (Jie et al. 2014; Varmuza and Filzmoser 2009). Before calibration, the spectrum outlier was analyzed and then removed from the dataset. In addition, to remove useless information, pre-processing methods were adopted, such as multiplicative scatter correction (MSC) and standard normal variate (SNV) (Chia et al. 2013). During the modeling process, the samples were randomly divided into the calibration set or prediction set and had a ratio between 2:1 and 3:1. To evaluate the model performance, several attributes were taken into consideration, including the root mean square error of calibration (RMSEC), determination coefficient of calibration (\( {R}_C^2 \)), root mean square error of validation (RMSEP), determination coefficient of validation (\( {R}_P^2 \)), and the difference between RMSEC and RMSEP (D). Lower RMSEC and RMSEP while maintaining higher \( {R}_C^2 \) and \( {R}_P^2 \) indicate better accuracy of calibration models. Lower D indicates better robustness of calibration models (Xie et al. 2011; Williams and Sobering 1993; Yan 2005).

Results and Discussion

Statistics of the Samples

Figure 4 illustrates the frequency distribution histogram and distribution curve of the SSC values of the apple samples. It can be concluded that the SSC values obey a normal distribution in this collection of samples. In addition, the range of SSC values was from 10.23 to 16.33%, and the standard deviation (SD) was 1.059%. The SD value took up approximately 17.3% in the range difference of 4.1%, which indicates that the difference in SSC values was large enough to be analyzed (Savenije et al. 2006).

Fig. 4
figure 4

Frequency distribution histogram and distribution curve of SSC values for apple samples (n = 150)

Spectral Characteristics and Comparison

Figure 5 shows the total average absorbance spectra for 150 apple samples at four different configurations: the adoption of the mono-branch optical fiber in a specific position (a1, b1), mono-branch optical fiber in a random position (a1, b2), binary-branch optical fiber in a specific position (a2, b1), and binary-branch optical fiber in a random position (a2, b2). First, clear peaks and valleys were observed in the wavelength range of 500–1000 nm, which accounts for a major part of the following analysis. For the different configurations, the absorption peaks occur at similar wavelengths of 530 nm, 670 nm, 760 nm, 830 nm, and 980 nm. Among these, the sharpest peak is at the wavelength of 670 nm, which is associated with the absorption of peel chlorophyll at approximately 660 nm (Merzlyak et al. 2003). In addition, there is a mild absorption peak at 760 nm, which is mainly associated with the third overtone of the O–H bond, and the small absorption peak at 830 nm, which is associated with the third overtone of the C–H bond (Wang et al. 2014). The slight peak at 980 nm is attributed to the absorption by water and carbohydrate (Nicolaï et al. 2007). In terms of apple, the SSC value refers to the content of soluble sugars, mainly composed of sucrose, glucose, and fructose. These sugars have different constitutions of O–H bond and C–H bond, and consequently the difference in their contents will influence the NIR spectra. However, we cannot directly observe a specific peak that is related to the value of SSC. Therefore, multivariate analytical methods were conducted to obtain useful information for establishing the calibration and prediction models of SSC. In the following analysis, the wavelength range of 350–1030 nm, which includes the visible spectroscopy region (Wang et al. 2014), was adopted in the SMLR models due to the signal-noise ratio. After deleting possible concentration outliers suggested from leverage and student residual calculated by TQ Analyst, there was no obvious improvement in RMSECV of models. It suggested that the concentration acquired is accurate and correct. Then we carried out “spectrum outlier” function on TQ Analyst. At last, there were respectively 147, 146, 145, and 150 spectra left to be further analyzed for the four configurations after removing the spectrum outlier by TQ Analyst.

Fig. 5
figure 5

Total absorbance spectra of 150 apples at four configurations. a1 mono-branch optical fiber, a2 binary-branch optical fiber, b1 specific position, b2 random position

Except for the slight variation in spectral intensity, no marked difference was observed from Fig. 5. However, it does not necessarily mean that there is little influence on the prediction models for SSC determination. Further research should be investigated in the following analysis.

Influence of Different Detection Positions on the SMLR Models

SSC calibration and prediction results established with the spectra acquired at four configurations are listed in Table 1. Although pre-processing methods were applied before modelling, there was no apparent improvement. Therefore, only the results of the SMLR models established with the original data are listed in the table. The biggest difference between RMSEC from four configurations was 0.102%, while the smallest difference between those was 0.018%.

Table 1 SSC calibration and prediction results of apples by SMLR models

It was supposed that the model performance of the specific position would be superior to that of the random position, which is validated by the calibration and prediction results in Table 1. While keeping the selected wavelength (SW) at the same level, it was observed that the models obtained in the specific position performed slightly better in both configurations with the different optical fibers, especially in terms of accuracy of the lower RMSEC and RMSEP. There are multiple reasons accounting for this, and the interactance light propagation path in the specific position is one of these. At the specific position, light emitted from the light source illuminates the equator of the apple and then penetrates through one-half of the apple; the detectors placed around the equator to receive light contain information on internal quality. Placing the detector around the equator area is abundantly representative of the whole apple. Lastly, the geometry of this area is simpler than that in the areas of the calyx and stem. The sharp variation in shape of the calyx and stem areas contributes to the instability of the light path in the detection process, which leads to useless incident light, including reflective and scattering light. Apart from this, detection at the random position means detecting different regions at each time, which also deteriorates the performance of the calibration and prediction models.

Influence of Double Detection Regions on SMLR Models

The different positions influenced the performance of the prediction models, which can be greatly improved by modifying the systematic configurations and adding detection regions simultaneously in this work. In both detection positions, the performance of the models established with spectra acquired with the mono-branch optical fiber was slightly superior to that acquired with the binary-branch optical fiber in terms of the lower accuracy of the RMSEC and RMSEP. However, when detected at the specific position, the performance of the models established with spectra acquired with the binary-branch optical fiber was apparently better than that acquired with the mono-branch optical fiber in terms of robustness, with more than 50% reduction in the D value. This result suggests that adding detection regions at the same time can improve the robustness of the prediction models more than accuracy. The results of the calibration and prediction models are susceptible to the heterogeneous conditions of the apples. Adding detection regions was aimed at reducing the influence of inhomogeneity of the SSC distribution in apples. It may be that adopting the binary-branch optical fiber is equal to acquiring more effective information from the different regions of the apple, which plays a significant role in improving model robustness.

Model Evaluation

From the above discussion, the optimal robustness of the model was shown by the original spectra obtained at the specific position in the binary-branch optical fiber configuration (RMSEC 0.586%, \( {R}_C^2 \) 0.7319, RMSEP 0.610%, \( {R}_P^2 \) 0.6295). The optimal accuracy of the model was shown by the original spectra obtained at the specific position in the mono-branch optical fiber configuration (RMSEC 0.508%, \( {R}_C^2 \) 0.8073, RMSEP 0.601%, \( {R}_P^2 \) 0.6293). The scatter plots of the calibration and prediction models for the SSC measurement are respectively illustrated by Fig. 6a and b. However, regarding the practical application of the on-line grading system, robustness plays a more significant part in model performance than accuracy. While the accuracy of these two models is relatively close, the former is considered to have the optimal model performance in this present work. What’s more, since the standard error is 1.059%, the RMSEP in the optimal model shows a satisfying predicting ability, which proved the feasibility of on-line SSC determination of apples.

Fig. 6.
figure 6

Scatter plots of SSC calibration and prediction models for apples, adopting binary-branch optical fiber at the specific position (a) and mono-branch optical fiber (b) at the specific position

Sun et al. (2009) has compared the SSC detection of apples by both portable and on-line systems. In the on-line system, the optimal model generated the results of RMSEC 0.69%, \( {R}_C^2 \) 0.8464, RMSEP 1.04%, and \( {R}_P^2 \) 0.6084 after processing the effective wavelengths, which were inferior to the results in this work both in accuracy and robustness. Moreover, the surface scanning measurement of Schmutzler and Huck (2014) yielded a root mean square error of cross-validation (RMSECV) value of 0.46% (RMSECV was 0.69% in the optimal model here), whose apples were measured at approximately 200 points at the same time. Considering the time consumed in the surface scanning, the on-line detection system described here is more advantageous for practical application.

Conclusions

In the present work, an on-line NIR detection system was used based on bicone roller transportation. Given that apple positions are uncertain after rolling, two detection positions, including a specific position and a random position, were compared to illustrate their influences on the prediction models. At the same time, two kinds of optical fibers, the mono-branch optical fiber and the binary-branch optical fiber, were adopted to acquire the spectra of the samples at single and double detection regions, respectively. It was found that better prediction results can be obtained at the specific position than at the random position. However, in the practical application, the random position is preferred considering its efficiency. Therefore, systematic modifications of the detection regions were proposed to compensate for this difference. Improved robustness was observed in the binary-branch optical fiber configuration (two detection regions), while improved accuracy was observed with the mono-branch optical fiber configuration (one detection region). It was concluded that adopting double detection regions simultaneously may have positive effects on acquiring representative information from the samples. The optimal model was established using spectra acquired at the specific position in the configuration with binary-branch optical fiber, when taking both model robustness and accuracy into account. Last but not least, other than modifying the systematic setups, compensating algorithms should be investigated according to the variation of detection positions in future work.