Introduction

Potato (Solanum tuberosum L.) is an important food crop and the fourth largest staple crop in China after rice, wheat and maize (Wang et al., 2011). Their higher yield per unit area and nutritional value have led to an increase in potato production over the past years (Wang et al., 2015). In the past, the majority of potatoes in China were consumed in Chinese cuisine and freshly fried or roasted. Incorporating potatoes into food production and processing is still in its infancy. However, potatoes can provide humans with a large proportion of their dietary daily energy intake compared with other crops. Thus, potato is a good supplier of many nutrients, including carbohydrates, protein, minerals and vitamins (Tian et al., 2016). In 2015, recognizing the importance of the potato industry, the Ministry of Agriculture of the People’s Republic of China implemented a plan to boost potato production and make the tuber one of the nation’s staple foods. According to the plan, China will have more than 6.67 million hectares of potato planting area, and at least 30% could be processed into staple foods. Thus, the government and research institutes have been designing and planning potato plantings in accordance with local resources, diversifying the variety of potato-incorporated staple foods (PISF) based on market demand, improving consumer demand for PISF production, modernizing food processing and manufacturing PISFs.

Potato flour has been incorporated into wheat and rice flour-based foods for nutrition fortification because potato flour has a higher nutritional value than either wheat or rice flour (Please see Supplementary Material 1). The first generation of PISFs, including noodles and Chinese steamed bread (CSB), are already on the market. These products were prepared with a high content of potato flour (> 20%). Generally, the higher the potato flour content, the higher the price of the PISF. However, there are no significant chemical differences between potato and wheat or rice flours. They all share the same major chemical constituent, namely starch, which makes it difficult to determine the real potato flour contents in PISFs. Substituting a lower potato flour content for a higher one, without disclosure, can be very lucrative for the manufacturer or supplier of raw materials because their profit margins can be greatly increased. Therefore, it is imperative to develop a rapid determination method for the content of potato flour in PISFs to prevent major economic fraud and maintain market order.

Near-infrared (NIR) spectroscopy has been frequently used for the analysis of agricultural products and other foods for the past several years (Rady and Guyer, 2015). NIR covers a broad range of the electromagnetic spectrum between 780 and 2500 nm and is associated with the overtones and combination of fundamental O–H, C–H and N–H bonds, which are the primary structural components of organic molecules, such as protein, starch and water (Futami et al., 2016). However, overlapping and broad absorption peaks make it difficult to complete quantitative analyses using NIR spectra. Modern multivariate statistical techniques, including partial least-squares (PLS) regressions and principal components analyses (PCAs), aid in rapidly analyzing a large number of data values acquired using different analytical techniques. The combination of NIR and chemometric methods has been successfully used in the classification and prediction of potato properties (Ni et al., 2011), assessment of potato dry matter (Bernhard et al., 2016; Helgerud et al., 2012; 2015), phytochemical determination and classification in purple- and red-fleshed potato tubers (Tierno et al., 2015), detection of potato resistant starch content (Ayvaz and Rodriguez-Saona, 2015), screening of acrylamide content in commercial potato chips, discrimination of purple sweet potatoes and their samples adulterated with the white sweet potato flour (Ding et al., 2015), quality control of potato chips (Shiroma and Rodriguez-Saona, 2009) and on-line monitoring of fat, dry matter and acrylamide contents in potato chips (Adedipe et al., 2016; Pedreschi et al. 2010). However, there have been limited reports on methods for determining the potato flour contents of PISFs. Thus, this study aimed to investigate the practicality of using NIR spectroscopy as a rapid method for the quantitative analysis of the potato flour content in a PISF, using CSB as the model.

Materials and methods

Materials

Wheat flour with a high gluten strength (Wudeli Flour Co., Ltd., Hebei, China and Luwang Co., Ltd., Shandong, China) and potato flour (Linkage Potato Co., Ltd., Inner Mongolia, China; Yanbei Potato Co., Ltd., Zhangjiakou, China and Sanlai Food Co., Ltd., Shanxi, China) were purchased. A commercial instant dried yeast (Angel Yeast Co., Ltd., Hubei, China), flour improving agent (Angel Yeast Co., Ltd., Hubei, China), food-grade sucrose and salt were purchased from a local supermarket.

Preparation of CSB

CSB was prepared with 0–42% potato flour and the basic potato flour-incorporated dough constituents as follows: Wheat flour (58–100%), potato flour (0–42%) and water (31–37%) were combined to form a dough according to the relative ratios. Then, yeast (0.9 g), sucrose (2.5 g), salt (0.3 g) and a steamed bread flour improving agent (0.6 g) were added to doughs having different ratios of potato flour. After all of the ingredients were mixed and kneaded to form dough, the dough was subsequently fermented, sheeted, rolled, divided, proofed, steamed and cooled (Shiau et al., 2015).

Instrumental analysis

The NIR spectra of CSB and potato flour-incorporated CSB (24 h after CSB preparation) were measured on a NIR equipment (Antaris II, Thermo Fisher Scientific Inc., Massachusetts, USA). An internal gold background, as the reference, and all of the spectra were measured using an InGaAs detector. The background was corrected every hour to eliminate the influence of baseline shifts. The NIR spectra of samples were recorded at 4 cm−1 intervals over a spectral range from 4000 to 10,000 cm−1 for the multivariate analysis. To improve the signal-to-noise ratio significantly, the average of 64 scans was chosen for each spectrum. The temperature during the test was maintained at room temperature (25 ± 1 °C), and the humidity was kept at a stable level by a dehumidifier.

Chemometric software for data processing and statistical techniques

The data were processed using the Unscrambler (Version 9.7; Camo Inc., Trondheim, Norway). The chemometric procedures and test arrangements are discussed below.

Calibration and prediction sample sets

The potato flour (potato granules) content in the CSB ranged from 0 to 42%. The potato flour contents in the 132 samples of the calibration set were 0%, 2%, 4%, 6%, 8%, 10%, 12%, 14%, 16%, 18%, 20%, 22%, 24%, 26%, 28%, 30%, 32%, 34%, 36%, 38%, 40% and 42% (w/w). The calibration set was divided into two groups, wheat flour (Wudeli Flour Co., Ltd.) with potato flour (Linkage Potato Co., Ltd.) incorporated and wheat flour (Wudeli Flour Co., Ltd.) with potato flour (Sanlai Food Co., Ltd.) incorporated. The potato flour content in 66 samples from the validation set also ranged from 0 to 42% (w/w) in 2% intervals. The wheat flour (Luwang Co., Ltd.) in the external validation was incorporated with potato flour (Yanbei Potato Co., Ltd.). In addition, the samples in the calibration and external validation sets were prepared by different members of the laboratory to ensure that there was no association between the calibration and external validation sets.

The confirmation of measuring position

To determine the best position for acquiring a stable and credible spectrum of the CSB, three parts (the top, inner core and the bottom) of the CSB were evaluated by comparing the parameters, namely the root mean square error of calibration (RMSEC), in the calibration model, and the coefficient of determination between the predicted and the measured parameters in the calibration (R 2c ). The potato flour contents in 30 samples (6 samples for each potato flour content) were set as 0%, 5%, 10%, 15% and 20%. These CSBs were prepared using wheat flour (Wudeli Flour Co., Ltd.) and potato flour (Linkage Potato Co., Ltd.).

Calibration step: cross validation

In the present study, the linear prediction models between the spectral data and the potato flour content of the CSB were constructed using a PLS analysis. The calibration models in this study were developed using PLS regressions with leave-one-out cross validation, which creates sample subsets that are missing one sample and the analysis predicts the missing sample, iteratively, until every sample in the calibration set has been left out once (Porfire et al., 2012). First, one sample was randomly chosen by the software and removed from the calibration set, while the rest of the spectra were used to construct a PLS model. Then, this procedure was repeated over again until every sample in the calibration set had been left out.

The leverage was computed and used to detect outlying samples in the original data set, when the model calculation was complete. Outliers are samples with high leverage and studentized residuals, which compromise the precision of the model (Adedipe et al., 2016). The parameters, such as RMSEC, the root mean square error of cross-validation (RMSECV), RMSEP, the root mean square error of prediction (RMSEP), R 2c and the coefficient of the cross-validation (R 2cv ) were used to evaluate the performance of the PLS models. In addition, the ratio to performance deviation (RPD) was introduced to measure the precision of the PLS model based on the root mean square values (Williams and Sobering, 1993):

$${\text{RPD}} = \delta /{\text{RMSEP}}$$
(1)

where δ represents the standard deviation of the validation set. An RPD value of < 1 indicates that the established model has no predictive power; 1 ≤ RPD value < 2 indicates that the established model can only separate the lower from higher values; 2 ≤ RPD value < 3 indicates that a reasonable model is already established; and 3 < RPD value indicates that an excellent model already established (Landsberg and Waring, 1997).

To improve the reliability, accuracy and stability of the calibration models, data were pre-processed before analyses. The steps used are as follows (Krepper et al., 2017; Sunoj et al., 2016):

  1. (1)

    Automatic baseline correction, automatic smooth processing (Savitzky–Golay, three data points) and normalization;

  2. (2)

    First derivative (Savitzky–Golay, polynomial = 2) was calculated by different data point gaps (5, 9 and 13 pts); and

  3. (3)

    Second derivative (Savitzky–Golay, polynomial = 2) was calculated by different data point gaps (5, 9 and 13 pts).

The optimum number of PLS factors was determined by plotting the RMSECV against the number of factors and determining the minimum of the plot (Winkler-Moser et al., 2015). Additionally, the best model for the determination of the potato flour content should have the highest RPD (Mantanus et al., 2010).

External validation step

The external validation of the method was checked by adding the NIR spectra of 198 new samples (potato flour content: 2–40%) to the optimal model (Bin et al., 2016). These samples were prepared with wheat flour and potato flour from different sources (different brands). Finally, the RMSEP, the coefficient of determination (R2) for the validation and the RPD of the model were evaluated.

Analytical characteristics test

The analytical characteristics of this method were evaluated according to our previous report (Liu et al., 2013).

$${\text{Recovery}}\left( \% \right) = \left( {{\text{P-NIR}}/{\text{P-SS}}} \right) \times 100\%$$
(2)

where P-NIR represents proportion of potato flour in sample detected by NIR spectroscopy, and P-SS represents proportion of potato flour in sample spiked by laboratory personnel. The intra-assay relative standard deviation (RSD) was determined for each proportion in three replicates. The inter-assay RSD was evaluated for each proportion prepared by 20 laboratory personnel.

PCA analysis

PCA analysis was carried out using the Unscrambler (Version 9.7, Camo Inc., Trondheim, Norway). To determine the number of principal components that are statistically significant in capturing the underlying features in data sets, a leave-one-out cross validation method (Wold et al., 2001) was performed.

Results and discussion

NIR spectra of potato and wheat flours

The raw NIR spectra of potato and wheat flours are presented in Fig. 1A. The NIR spectroscopy showed that the peak at approximately 4015 cm−1 belongs to a combination of the C–H and C–C stretching vibrations (Xu et al., 2015). The peak at 4308 cm−1 could also be ascribed to a combination of the C–H and C–C stretching vibrations (Xu et al., 2015). The peaks at approximately 4747 cm−1 and 5168 cm−1 represented the combination of the C–O stretch and O–H deformation and a combination of the baseband of the O–H stretch and the first overtone of C–O deformation (Li et al., 2012), respectively. A few bands between 5418–5931 cm−1 and 5982–6811 cm−1 belonged to the first overtones of the C–H stretch of various groups and the first overtone of the O–H stretch or N–H stretch, respectively. The last peak at approximately 8320 cm−1 represented the second overtones of the C–H stretches of various groups (Vagnini et al., 2009). The raw NIR spectra of the potato and wheat flours showed insignificant differences, which meant that the determination of potato flour in CSB was obstructed without the aid of chemometrics.

Fig. 1
figure 1

NIR spectra of potato and wheat flour (A), RMSECV (root-mean-square error of cross-validation), R 2 CV (coefficient of determination of cross-validation) and RPD (the ratio to performance deviation) plotted as a function of the number of factors used in the PLS (partial least squares) model (B)

The confirmation of measuring position

Owing to different water evaporation rates, three positions, the top, inner core and bottom layer of CSB, exhibited different surface layer textures. Thus, it was necessary to determine the part that was most suitable for acquiring the NIR spectra. For each position, the PLS model was calculated, and the results are summarized in Table 1. According to the observed RMSECV and R 2cv values, the NIR spectra from the top layer and inner core showed good performance. Additionally, the inner core produced the most accurate determination of potato flour in CSB. Thus, the inner core of CSB was chosen as the position for acquiring the NIR spectra.

Table 1 Results of calibration and cross-validation of the PLS models at the different measuring position of CSB

Calibration and validation of the PCA and PLS procedures

Optimization of the calibration model

In the present study, the full spectral range of 4000–10,000 cm−1 was used to determine the potato flour content in CSB. The spectral pre-treatments and the number of PLS factors were investigated in this study to determine their affects. Before calibration, outliers should be removed. Therefore, the leverage values of the PCA were carefully checked. After the removal of the outliers, the PLS models were recalculated for each pre-treatment. Various spectral pre-treatments for the determination of the potato flour content in CSB are summarized in Table 2.

Table 2 Results of calibration and validation of the PLS models on the raw spectra and spectra with various pretreatments

Prediction results based on the spectral data sets after treatments with different pre-processing methods were compared with the ones using raw spectra. The worst prediction, with R 2 CV  = 0.794 and RMSECV = 5.93, was obtained by directly using the raw spectra. However, the calibration results from different spectral treatments were very close. Positive correlations were found between the values predicted with the NIR models and the proportion of potato flour additions, which produced desirable R 2cv and RMSECV values of 0.8592–0.8955 and 4.22–5.09, respectively. Regression models based on the first and second derivatives of the NIR spectra resulted in improved PLS prediction model performances. In general, the NIR-reflectance measurement of a sample measures the diffusively reflected (the primary source of information in the NIR spectra) and specularly reflected (mirror-like reflections that do not contain any chemical information) radiation. The selected pre-processing methods can remove some, but not all, of the undesired scatter or particle-size information in the spectra. The most basic method for derivation is finite differences, in which the first derivative is estimated as the difference between two subsequent spectral measurement points. The second order derivative is then estimated by calculating the difference between two successive points of the first-order derivative spectra. Both derivation techniques can improve the prediction behavior of the model by increasing the signal-to-noise ratio. Thus, the prediction behavior of the model was improved after the pre-treatment (Li et al., 2010).

The pre-processing method of the first derivative (5 pts) was selected as the spectral pre-treatment method that resulted in the best prediction performance, having the lowest RMSECV of 4.22 and the highest R 2 cv of 0.8955, for the PLS prediction model. Furthermore, 20 PLS factors were introduced into the PLS model to optimize of the number of PLS factors. The RMSECV, R 2 CV and RPD values were recorded as functions of the PLS factors for determining the potato flour content in CSB using the pre-processing method of the first derivative (5 pts) as presented in Fig. 1B. The RMSECV value decreased sharply within the initial 12 PLS factors. Adding more PLS factors to the model improved its linearity, resulting in R 2 CV and RPD increasing and RMSECV decreasing as the number of PLS factors increased (Rinnan et al., 2009). However, too many PLS factors (> 8) led to the model running the risk of over-fitting (Feng et al., 2015; Hacisalihoglu et al., 2016). The RPD values were relatively high (3.65–6.77) when there were 8–20 PLS factors. To balance the advantages and disadvantages, in this study, the optimal number of PLS factors was set as eight.

PCA

To produce a clear 2D map, not all of the samples were used in the PCA. The score plot of the first two PCs of the PCA was carried out by analyzing 36 samples (0%, 6%, 10%, 20%, 30% and 40%) with the pre-treatment of the first derivative (5 pts), as can be seen in Fig. 2A. PC1 and PC2 account for 96.54% and 2.84% of the total variation among these samples, respectively. CSBs with different potato flour contents were located at different parts of the 2D map. The whole wheat CSB lay on the positive side of both PCs (top right corner), and the 40% potato flour-incorporated CSB lay mostly along the negative value area of PC1 and positive value area of PC2 (top left corner). However, others did not produce apparent regional distributions. The X-loadings plot indicates how well a variable was taken into account by the model components, and it was used to understand how much each variable contributed to meaningful variation in the data and to interpret variable relationships (Wójcicki et al., 2015). The PCA loading spectra of the first two factors are shown in Fig. 2B. They accounted for 99.38% of the total variation and revealed characteristics among the samples. The highest negative loading in PC1 was found at approximately 5299 cm−1, as presented in Fig. 2B. Positive loadings on PC 2 were found in the range of 4169–5546 cm−1 and 6807–7262 cm−1. These regions could help us to understand the differences between the potato and wheat flours, which requires intensive study.

Fig. 2
figure 2

Two-D scores plots (A) and the loading plots (B) of PCA results (total 36 samples, 0%, 6%, 10%, 20%, 30% and 40%) and NIR predicted value vs. Actural potato flour content (C) in the calibration set (open square) and external validation set (filled circle)

External validation

An external validation was carried out to check the predictive ability of the optimal PLS model by retraining the NIR spectroscopy of 198 samples in the external validation set, and the results are shown in Fig. 2C. The optimal PLS prediction model was performed with eight PLS factors and produced satisfactory R2 and RPD values of 0.8865 and 3.07, respectively, in the external validation. NIR technology provides a potential analysis tool for the prediction of the potato flour content in CSB.

Analytical characteristics of the PLS method

To better understand the practicality of using NIR spectra for determining the potato flour content in CSB, recovery and both intra- and inter-assay RSDs of this model were evaluated. As demonstrated in Table 3, recoveries of between 93.1 and 102.5% were obtained in tests between the actual potato flour contents and NIR-predicted values. The intra-assay RSD for each proportion, prepared in triplicate, ranged from 3.3 to 8.3%, and the inter-assay RSD for each proportion, prepared by different laboratory personnel, ranged from 7.6 to 17.2%. Thus, the NIR technology exhibited satisfactory sensitivity and repeatability levels for samples with potato flour contents greater than 20%, thereby providing a feasible way of distinguishing CSBs with high (> 20%) potato flour contents. Table 4 showed the effect of moisture content on the potato flour content in CSB which was predicted by the NIR model. Samples with a potato flour content ranged from 13.5 to 38.2% were thermally dried to 3 level of moisture content. As we can seen from the table, the predicted values from the NIR model was decreased as the decrease of moisture content. The moisture content had a significant influence on the NIR models for prediction of composition of materials. The reason may ascribe to the change in intensity and positions of absorbance of –OH influenced by the interaction between water and other components (Williams, 2009).

Table 3 Analytical characteristics of the proposed method
Table 4 Influence of moisture content on the stability of the proposed method

The ability of NIR technology combined with PLS regression to determine the potato flour content in CSB was investigated. The ideal calibration model with eight PLS factors (R 2 CV  = 0.9256, RPD = 3.65) for the determination of potato flour content in CSB was developed using NIR spectra from 4000 to 10,000 cm−1 and a pre-processing combination of baseline correction, smooth processing, normalization, and a first derivative with five data point gaps. CSB prepared using potato and wheat flour of different brands were used for the external validation, and the optimal calibration showed a good prediction performance (R2 = 0.8865, RPD = 3.07). This is the first application of NIR spectroscopy combined with the chemometric method to determine the potato content in a PISF. Moreover, this work described a quick and non-destructive analysis method compared with the tedious routine analyses, such as gel electrophoresis and PCR.

To make the application of this NIR methodology more representative and practical, it is advisable to consider more factors, such as the storage time and method (refrigeration and freezing) of the CSB, the raw material form (mashed potato and potato flour prepared using different processing methods), the brand (the more brands that are considered in the preparation, the more accurate the calibration) and other (moisture, starch retrogradation, and adulteration of potato starch). In addition, more research should be performed to confirm the general applicability of NIR for potato flour content determination in other staple foods.