Introduction

Fish is a highly perishable product and keeping its quality is critical to the industry. Therefore techniques such as cooling (Hu and Sun 2000; Sun and Hu 2003; Mc Donald and Sun 2001; Wang and Sun 2002a, b, 2004; Sun and Wang 2000; Sun 1997; Sun and Brosnan 1999, Zheng and Sun 2004), freezing (Kiani et al. 2011) and drying (Cui et al. 2008; Sun and Woods 1994) are often used to preserve its quality. On the other hand, development of novel detection methods is also very useful for the industry. For chilled or fresh fish muscle, rapid evaluation of freshness is very important and is of crucial significance in the fishery sector. It is well known that protein is one of the most important nutritional and chemical compositions in fish muscle. However, the proteins in fish and fish fillet are highly susceptible to the activities of endogenous enzymes and microbial spoilage (Alishahi and Aïder 2012; Ruiz-Capillas and Moral 2001). As a result, some important chemical and biochemical reactions happen and the protein degradation also occurs, which plays a significant role on the contribution of fish muscle quality and safety (Pacquit et al. 2006; Costa et al. 2011). During protein degradation, some undesirable breakdown products such as trimethylamine, dimethylamine and ammonia are generated and these nitrogen-containing compounds are generally called total volatile basic nitrogen (TVB-N) (Orban et al. 2011). As one of the important chemical indicators, the value of TVB-N is usually used for evaluation of fish protein degradation and freshness loss during cold storage with the rejection levels. For example, in China, for marine fish, the rejection limit of TVB-N content is regarded as 30 mg N/100 g; for freshwater fish, the TVB-N content cannot exceed 15 mg N/100 g (Zhang et al. 2011).

The conventional measurement method for TVB-N levels in fish tissue consists of extraction of volatiles bases by a perchloric acid solution followed by steam distillation of the extract which is then collected in boric acid and titrated against standard hydrochloric acid (Bhadra et al. 2015). However, the current method for TVB-N measurement is not only destructive, time-consuming and inefficient, but also not competent with modern industrial processing and detection purposes (Cai et al. 2011; Chen et al. 2013; Huang et al. 2015; Cheng et al. 2014a, b, c). Therefore, it is imperative to look for rapid, accurate, non-destructive and objective TVB-N level detection methods for assessment of protein degradation. By integrating spectroscopy with imaging or computer vision (Wu and Sun 2013a; Jackman et al. 2009; Du and Sun 2005) into one system, hyperspectral imaging technique combined with multivariate data analysis as an alternative, rapid and non-destructive tool has been gradually developed for inspection of quality and safety of various food products (Barbin et al. 2012, 2013; Kamruzzaman et al. 2012, 2013; Elmasry et al. 2012, 2013; Feng and Sun 2012, 2013; Wu and Sun 2013b; Liu et al. 2014; Feng et al. 2013; ElMasry et al. 2013) including fish muscle (Cheng et al. 2013; Cheng and Sun 2014; Cheng et al. 2014a, b, c). A typical hyperspectral imaging system generates a spatial map of spectral variation of sample (Sun 2010). The hyperspectral image contains a three-dimensional (3D) dataset called hypercube I (x, y, λ) that has one spectral dimension and two spatial dimensions. In addition, in order to enhance the application of hyperspectral imaging, chemometric methods such as partial least squares regression (PLSR) and least squares support vector machines (LS-SVM) and others are required (Abdi 2010; Martens et al. 2005; Lorente et al. 2012). As to TVB-N prediction, Cheng et al. (2014b) used a VIS-NIR hyperspectral iamging system coupled with back propagation artificial neural network (BP-ANN) for pork TVB-N content prediction with root mean square errors in prediction (RMSEP) of 6.34 mg/100 g and R P of 0.83. Meanwhile, Cheng et al. (2014c) selected nine optimal wavelengths from visible and near-infrared (VIS-NIR) fish hyperspectral images using successive projections algorithm (SPA), and the developed LS-SVM model showed good performance for TVB-N prediction with R 2 P of 0.90 and RMSEP of 2.78 mg/100 g. Also, Huang et al. (2015) developed an NIR multispectral imaging with BP-adaptive boosting (AdaBoost) algorithm for prediction of TVB-N level in pork meat and the results showed that the RMSEP value was 6.94 mg/100 g and the R P value was 0.83. Although these studies (Chen et al. 2013; Huang et al. 2015; Cheng et al. 2014a, b, c) have confirmed the potential and success of using hyperspectral imaging with chemometric analysis for prediction of TVB-N level, the accuracies and reliabilities of these reported models are still low and it is not appropriate to be recommended for online and real-time detection of meat muscle chemical changes. Therefore, it is very necessary and interesting to look for new variable selection algorithms for improving the TVB-N prediction ability. Several feature wavelengths selected using principal component analysis (PCA) and genetic algorithm (GA) achieved similar results as the full wavelength models, which provided promise to rapid measurement of TVB-N value (Chen et al. 2013; Huang et al. 2015). Additionally, other common wavelength selection methods such as SPA and competitive adaptive reweighted sampling (CARS) have been extensively used for selecting feature wavelengths to design multispectral imaging systems over the last several years (Cheng et al. 2014a, b, c).

n this study, the core idea of the used Physarum network (PN) algorithm is to find a representative wavelength in every sub-spectral range to reduce the high dimensionality. If the sub-spectral range is not useful for predicting the TVB-N values, the candidate wavelength will not be informative either. Therefore, it is better to use PN combined with genetic algorithm (GA) for variable selection. Therefore, a novel algorithm using PN combined with GA was developed to select the most informative wavelengths from the hyperspectral images with multivariate data analysis for enhancing the measurement of TVB-N level and evaluation of fish freshness in a rapid and non-destructive manner.

Materials and Methods

Fish Fillet Sample Preparation

A total of 20 grass carp (Ctenopharyngodon idella) fish from the same batch with the weight of approximately 1.5 kg were purchased from a local aquatic products market in Guangzhou, China, and directly transported to the laboratory alive in water within 15 min. Upon arrival, the fishes were stunned by a sharp blow to the head with a wooden stick and then gill cutting. The internal organs were removed at the same time with bloodletting from the fish belly location. Then, they were instantly beheaded, skinned and filleted and then washed with cold water. As a result, 140 subsamples of fish fillets were obtained from different locations of tested fish fillets. In order to acquire a wider range of TVB-N values for better prediction results, all the subsamples were labeled and packaged into the sealed plastic bags and divided into five groups (28 subsamples in each group) stored at 4 ± 1 °C within 0, 2, 4, 6, and 8 days. Among them, 100 samples were used as the calibration set and the remaining 40 samples were regard as the prediction set. The relevant statistics information of TVB-N value is illustrated in Table 1.

Table 1 Reference measured TVB-N values (mg N/100 g) for calibration and prediction purposes

Reference TVB-N Value Determination

TVB-N was traditionally determined by a stream distillation method according to (Cheng et al. 2014a, b, c). Finally, the TVB-N level was expressed as milligram N/100 g fish muscle. Each analysis was repeated in triplicate.

Spectral Variable Selection

Hyperspectral Imaging System

A reflectance hyperspectral imaging system was used to acquire the hyperspectral images of grass carp fillets. This system includes a line-scan imaging spectrograph (Imspector V10E, Spectral Imaging Ltd., Oulu, Finland) with the spectral range of 308–1105 nm, a high-performance CCD camera (DL-604M, Andor, Ireland) with the effective resolution of 1004 × 1002 pixels, a camera lens (OLE23, Schneider, German), a light source system including two 150-W halogen lamps (2900-ER, Illumination Technologies Inc., New York, USA) with a fiber optical line light located at an angle of 45° to illuminate the moving platform operated by a stepping motor (IRCP0076-1COMB, Isuzu Optics Corp., Taiwan, China), and a computer control system for imaging data acquisition. Due to the low signal-to-noise rate at the two ends of the spectral range of 308–1105 nm, the effective wavelength range is 400–1000 nm with a spectral increment of about 1.5 nm between the contiguous bands, thus creating a total of 381 bands.

Image Acquisition and Calibration

According to the grouping method, for each group, 28 samples were taken from the plastic bags and placed on the moving platform and then transferred to the field of the view of the camera to be scanned line by line for acquisition of hyperspectral images of fish fillets with an appropriate motor speed and computer control parameters. Consequently, 140 hyperspectral images were created, recorded, and stored in a raw format before being processed. The raw images were calibrated into the reflectance mode for further analysis using the following equation.

$$ {R}_c=\frac{R_0-B}{W-B}\times 100\% $$
(1)

where R 0 means the raw image, B means the black reference image with 0 % reflectance obtained by fully covering the camera lens with its black cap, and W means standard white reference image with 100 % reflectance acquired by using a uniform Teflon white tile and R C means the calibrated image.

Spectral Data Extraction

After image acquisition and reflectance calibration, the region of interests (ROIs) with an ellipse shape within hyperspectral images were identified and selected depending on the significant locations to the areas of the grass carp fillets that were used for the reference measured TVB-N method. Then, the documented spectral data within ROIs for the samples were extracted and averaged using the software ENVI v4.8 (ITT Visual Information Solutions, Boulder, CO, USA).

Candidate Wavelengths Selection

The acquired hyperspectral images usually characterize the high dimensionality with redundancy and multicollinearity among contiguous wavelength bands, which can easily result in the consequent time-consuming calibration process and affect the speed of computation related to the processing of the hyperspectral images, Thus, it is of interest to find the minimal number of wavelengths carrying the most valuable information, which may be equally or more efficient than the full wavelength range and provide satisfactory prediction results. In this study, GA and PN algorithms were used to select the most informative spectral variables for eliminating the irrelevant information and improving the prediction accuracy of TVB-N during multivariate analysis. GA is very useful in solving complex problems of optimization and it employs a probabilistic, non-local search process based on the principle of natural genetic selection systems (Leardi and Gonzalez 1998). GA aims to reach the global optimum for a problem by sifting ‘best’ individuals in the population using mutation and cross-over operations. The nature of randomness in GA leads to the variation in the selected wavelengths during different implementations; therefore, GA programs were executed repeatedly to select the initial wavelength candidates (Höskuldsson 2001). A detailed explanation and description of GA was reported elsewhere (Höskuldsson 2001; Leardi and Gonzalez 1998). The key parameters of GA adopted in this study obtained from Matlab design were set as follows: population size of 30 chromosomes, number of runs of 100, maximum number of variables selected in the same chromosome of 30, probability of mutation of 1 %, deletion groups of 5, probability of cross-over of 0.5 and window width for smoothing of 3.

As to the use of PN algorithm, there is the prior knowledge about Physarum polycephalum. P. polycephalum is a slime mold. It can be in a vegetative phase that is called plasmodium. The plasmodium is an amoeba-like organism with a body shape of a dendritic network consisting of tubular components (Liu et al. 2015; Bao et al. 2014). Nakagaki et al. conducted an interesting experiment and put plasmodium in a maze with two food sources: one at the entrance and the other at the exit of the maze. It was found that the plasmodium changed its body shape to connect the two food sources (entrance and exit); moreover, the plasmodium always connected the two points using the shortest length of tubes, i.e., it finds the shortest route in the maze. Therefore, the Physarum network model has been developed and designed for solving maze problems (Tero et al. 2007). Its experimentally mathematical model was described in detail in the previous publications (Chen et al. 2016). The scheme of selecting candidate wavelengths with the least correlation using PN algorithm is illustrated in Fig. 1. The whole spectrum is divided into M sub-spectral ranges (SR). In every sub-spectral range, there is only one wavelength selected; finally, M important wavelengths are selected with the least correlation. If there are P wavelengths in each sub-spectral range, there will be P M possible combinations of wavelengths, e.g., SR1W2-SR2W4-SR3W7- ... -SRMW5, SR1W3-SR2W5-SR3W6- ... -SRMW4, etc. It is impossible to find the combination with the least correlation using enumeration if M and P are large. Therefore, the problem of finding the wavelengths with the least correlation can be transformed into a problem of finding the shortest route in a maze by making the following assumptions: (1) each SR is a node in a maze and (2) neighboring nodes are connected by tubes. Every wavelength in a node affords a possible tube for the node to connect its adjacent nodes. For instance, if each of two contiguous nodes has P wavelengths, there will be P 2 tubes used for connecting the two nodes, (3) the length of each tube is the correlation coefficient between the wavelengths at the two ends of the tube, (4) there are two virtual nodes: one acts as the source node and the other acts as the sink node, and (5) the length of the tubes connecting the source node and its adjacent node and the length of the tubes connecting the sink node and its adjacent node set as 1 (Chen et al. 2016). With these assumptions, the steps for using PN algorithm combined with GA to choose the optimal wavelengths are described below.

Fig. 1
figure 1

Using PN to select wavelengths with the least correlation. The red tubes indicate the shortest route

(1) Dividing the whole spectrum into M SR and considering each SR as a node in the maze. The number of wavelengths included in every SR is required to be the similarity for each.

(2) Setting the conductivity of every tube at the same original value.

(3) Using the following expressions to indicate the correlation coefficient (R ij ) between wavelength i and wavelength j based on the assumption (R ij  = L ij ).

$$ {R}_{ij}={L}_{ij}=\frac{{\displaystyle {\sum}_{k=1}^N}\left({f}_{ik}-{\overset{-}{f}}_i\right)\left({f}_{jk}-{\overset{-}{f}}_{ij}\right)}{\sqrt{{\displaystyle {\sum}_{k=1}^N}{\left({f}_{ik}-{\overset{-}{f}}_i\right)}^2{\displaystyle {\sum}_{k=1}^N}{\left({f}_{jk}-{\overset{-}{f}}_j\right)}^2}} $$
(2)

where R ij is the sample correlation coefficient between wavelength i and wavelength j; N is the number of samples; f ik and f jk are the spectral response of the k-th sample on wavelength i and j, respectively; \( {\overset{-}{f}}_i \) and \( {\overset{-}{f}}_j \)are the average response of all samples on wavelength i and j, respectively.

Simultaneously,

$$ {L}_{ij}=\frac{D_{ij}\left({P}_i-{P}_j\right)}{Q_{ij}} $$
(3)
$$ {\displaystyle {Q}_{ij}}=\frac{\pi {\displaystyle {r}_{ij}^4}}{8\eta {\displaystyle {L}_{ij}}}\left({\displaystyle {P}_i}-{\displaystyle {P}_j}\right) $$
(4)
$$ {D}_{ij}=\frac{\pi {r}_{ij}^4}{8} $$
(5)

Among Eqs. (3) and (5), where D ij is the conductivity of the tube; Q ij is the nutrient flux (nutrient flow per unit area) through node i (N i ) to node j (N j ) in the maze; P i is the pressure at the node N i ; η is the viscosity coefficient of the flow and r ij and L ij are the radius and length of the tube connecting N i to N i , respectively.

Except for the source (N 1 ) and sink node (N 2 ), each node is defined as zero capacity. According to the conservation law of flow, the sum of flux at each node can be expressed as:

$$ {\displaystyle \sum_i}{Q}_{ij}=0,j\ne 1,2 $$
(6)

For the source (N 1 ) and sink node (N 2 ), the flux equations are:

$$ {\displaystyle \sum_j}{Q}_{1j}-{I}_0=0 $$
(7)

and:

$$ {\displaystyle \sum_j}{Q}_{2j}+{I}_0=0 $$
(8)

where I 0 is the flux from the source node to the sink, which is assumed to be constant in the model.

Based on Eqs. (6) to (8), Eq. (3) can be expressed as:

$$ \sum \frac{D_{ij}}{L_{ij}}\left({P}_i-{P}_j\right)=\left\{\begin{array}{c}\hfill {I}_0,j=1\hfill \\ {}\hfill -{I}_0,j=2\hfill \\ {}\hfill 0,\kern0.62em \mathrm{otherwise}\hfill \end{array}\right. $$
(9)

The conductivity of D ij is assumed to change when adapting to the flux of Q ij . Moreover, the tubes with zero conductivity will disappear. The evolution of D ij is expressed as:

$$ \frac{dD_{ij}}{dt}=f\left(\left|{Q}_{ij}\right|\right)-{D}_{ij} $$
(10)

(4) Setting an initial value for the flux from the source node and setting the pressure at the sink node as 0.

(5) Calculating the pressure P i at each node using Eq. (9).

(6) Calculating the Q ij on each tube using Eq. (3).

(7) Updating the conductivity D ij on each tube using the Eqs. (11) and (12).

$$ {D}_{ij}^{n+1}-{D}_{ij}^n=f\left(\left|{Q}_{ij}\right|\right)-{D}_{ij}^{n+1} $$
(11)
$$ f\left(\left|{Q}_{ij}\right|\right)=\frac{\left(1+\alpha \right){\left|{Q}_{ij}\right|}^{\mu }}{\left(1+\alpha {\left|{Q}_{ij}\right|}^{\mu}\right)} $$
(12)

where n is the current moment, n + 1 is the next moment, α is the number of sub-spectral ranges, and μ is the wavelength increment.

(8) If all the value of Eq. (11) are smaller than a limitation value or a preset iteration step has been reached, then go to the next step; if not, then go to step (5).

And (9) selecting the path with largest flux and choosing the wanted wavelengths on the path as the candidate wavelengths. The GA and PN-GA were implemented in Matlab 2010a software (The Mathworks Inc., Mass, USA).

Image Texture Variable Extraction

Hyperspectral image texture information also plays an important role in contributing to the quality changes of fish muscle during cold storage. In this study, gray-level gradient co-occurrence matrix (GLGCM) algorithm was used to extract the hidden image texture information describing the changes of the TVB-N level. GLGCM is a texture analysis technique, which captures the second-order statistics of gray-level gradients with the main characteristics of the spatial relationships of two basic elements of an image (gray and gradient) (Chan et al. 2007; Liu et al. 2014). Based on the changes of the gradient of gray levels, the image texture features can be effectively described. In this study, GLGCM was conducted on the first principal component (PC1) score image that explained the 97.8 % of variances to mine the image texture variables. Consequently, a total of 13 second-order statistical textural variables (small grads dominance (T 1), big grads dominance (T 2), energy (T 3), inertia (T 4), gray entropy (T 5), grads entropy (T 6), hybrid entropy (T 7), gray mean (μ 1), grads mean (μ 2), gray standard deviation (∂1), grads standard deviation (∂2), correlation (T 8) and inverse difference moment (T 9)) were extracted from the PC1 image and their calculation formulas are as follows. The extracted image texture variables were programmed using Matlab 2010a software (The Mathworks Inc., Mass, USA).

$$ {T}_1=\left[{\displaystyle \sum_{i=1}^n{\displaystyle \sum_{j=1}^n\frac{H\left(i,j\right)}{j^2}}}\right]/\left[{\displaystyle \sum_{i=1}^n{\displaystyle \sum_{j=1}^nH\left(i,j\right)}}\right] $$
(13)
$$ {T}_2=\left[{\displaystyle \sum_{i=1}^n{\displaystyle \sum_{j=1}^n{j}^2H\left(i,j\right)}}\right]/\left[{\displaystyle \sum_{i=1}^n{\displaystyle \sum_{j=1}^nH\left(i,j\right)}}\right] $$
(14)
$$ {T}_3={\displaystyle \sum_{i=1}^n{\displaystyle \sum_{j=1}^n{\left[P\left(i,j\right)\right]}^2}} $$
(15)
$$ {T}_4={\displaystyle \sum_{i=1}^n{\displaystyle \sum_{j=1}^n{\left(i-j\right)}^2\cdot P\left(i,j\right)}} $$
(16)
$$ {T}_5=-\left\{{\displaystyle \sum_{i=1}^n\left[{\displaystyle \sum_{j=1}^nP\left(i,j\right)}\right]\cdot \log \left[{\displaystyle \sum_{j=1}^nP\left(i,j\right)}\right]}\right\} $$
(17)
$$ {T}_6=-\left\{{\displaystyle \sum_{j=1}^n\left[{\displaystyle \sum_{i=1}^nP\left(i,j\right)}\right]\cdot \log \left[{\displaystyle \sum_{i=1}^nP\left(i,j\right)}\right]}\right\} $$
(18)
$$ {T}_7=-{\displaystyle \sum_{i=1}^n{\displaystyle \sum_{j=1}^nP\left(i,j\right)\cdot \log P\left(i,j\right)}} $$
(19)
$$ {\mu}_1={\displaystyle \sum_{i=1}^ni\cdot}\left[{\displaystyle \sum_{j=1}^nP\left(i,j\right)}\right] $$
(20)
$$ {\mu}_2={\displaystyle \sum_{j=1}^nj\cdot}\left[{\displaystyle \sum_{i=1}^nP\left(i,j\right)}\right] $$
(21)
$$ {\partial}_1={\left\{{\displaystyle \sum_{i=1}^n{\left(i-{\mu}_1\right)}^2}\left[{\displaystyle \sum_{j=1}^nP\left(i,j\right)}\right]\right\}}^{1/2} $$
(22)
$$ {\partial}_2={\left\{{\displaystyle \sum_{j=1}^n{\left(i-{\mu}_2\right)}^2}\left[{\displaystyle \sum_{i=1}^nP\left(i,j\right)}\right]\right\}}^{1/2} $$
(23)
$$ {T}_8=\frac{1}{\partial {}_1\partial_2}{\displaystyle \sum_{i=1}^n{\displaystyle \sum_{j=1}^n\left(i-{\mu}_1\right)\left(j-{\mu}_2\right)P\left(i,j\right)}} $$
(24)
$$ {T}_9={\displaystyle \sum_{i=1}^n{\displaystyle \sum_{j=1}^n\frac{1}{1+{\left(i-j\right)}^2}}}P\left(i,j\right) $$
(25)

Multivariate Data Analysis

Before quantitative determination of TVB-N value using modeling algorithms, the obtained spectra were needed to be preprocessed using multiplicative scatter correction (MSC) method, which is useful to reduce/eliminate some undesirable scattering effect and optical interference (Maleki et al. 2007). After the spectral data pretreatment, in this study, two highly efficient modeling methods of PLSR and LS-SVM were applied to establish the quantitative calibration models between the candidate variables extracted and selected from the hyperspectral images and the reference measured TVB-N values in grass carp fillets during cold storage. PLSR is a very important and classical linear multivariate data analysis tool and has been extensively developed for establishing quantitatively mathematical model (Abdi 2010; Martens et al. 2005). LS-SVM is another effective nonlinear modeling method. This methodology involves equality instead of inequality constraints, works with a least squares cost function, and utilizes nonlinear map function and projects into features to a high-dimensional space and adopts the Lagrange multiplier to calculate the partial differentiation of each feature to attain the optimal resolution (Cawley and Talbot 2002; Suykens et al. 2001). The implementation of PLSR and LS-SVM were conducted using the Matlab 2010a software (The Mathworks Inc., MA, USA).

Based on the used variable selection methods of GA and PN, the developed calibration models established using the selected spectral and image texture variables were named GA-PLSR, GA-LS-SVM, PN-GA-PLSR and PN-GA-LS-SVM models. Full cross-validation also called leave-one-out cross-validation was employed to validate the established PLSR and LS-SVM models, in which one test sample was removed from the calibration set and the corresponding model was then established based on the remaining calibration samples. At last, the TVB-N values of a set of new samples from the prediction set were predicted by the established models in order to verify model predictive ability. The reliability and accuracy of the model performance was usually evaluated by the indicators of coefficients of determination (R 2 C and R 2 P ) and root mean square errors in calibration and prediction (RMSEC and RMSEP). Generally, a good prediction model has superior values of R 2 C and R 2 P , and inferior values of RMSEC and RMSEP as well as a small difference between them. Figure 2 shows the main steps of determination of TVB-N level in grass carp fillet using hyperspectral imaging.

Fig. 2
figure 2

Main steps of determination of TVB-N level in grass carp fillet using hyperspectral imaging

Results and Discussion

Reference TVB-N Level and Spectral Features

Figure 3a shows the variations of averaged reference TVB-N values measured by the traditional analysis method in grass carp fish fillet during cold storage at 4 °C. It was observed that with the increase of the storage days, the TVB-N level was increased from 7.85 mg N/100 g (0 days) to 21.81 mg N/100 g (8 days), which presented a wider chemical variation range for enhancing the prediction performance and indicating the protein degradation of fish muscle. Figure 3b presents the average spectral information of different TVB-N levels of this kind of fish. It is obvious to find that the reflectance value of grass carp fillets with acceptable TVB-N level (TVB-N <15 mg N/100 g) showed higher value than that of with unacceptable TVB-N level (TVB-N >15 mg N/100 g). It means that grass carp fish fillets with different TVB-N levels indicated different spectral features, which is meaningful to seek some important wavelengths to reflect this difference. As to the important reasons, during cold storage, the fish muscle protein degraded into many nitrogen-containing compounds (Fig. 3c) due to the effects of enzymatic reaction and microbial spoilage. These degradation products to a large extent influenced the absorption of fish muscle. In addition, in this study, GA and PN-GA were carried out to select the most informative wavelengths related to TVB-N level of fish muscle from the full spectral range. As a result, 12 relevant wavelengths (419, 442, 445, 515, 560, 601, 660, 690, 730, 780, 850 and 971 nm) were selected by GA. PN-GA algorithm was used to identify six candidate wavelengths including 428, 550, 601, 655, 775 and 980 nm that are illustrated in (Fig. 3d). The future aim is to develop an efficient multispectral imaging system with fewer than ten optimal wavelengths; thus, in this study, the obtained six key variables were described. The selected important wavelengths were mainly located in the visible spectral range, which means that the TVB-N changes were related to the color or texture of fish muscle. In fact, it has been proved that protein degradation and oxidation in muscle tissue could influence the color components and texture properties (Lund et al. 2011; Vongsvivut et al. 2014). The wavelength located at 428 nm was probably related to the absorption of such special protein such as pigment protein in fish muscle. The wavelengths positioned around 550, 601, and 655 nm were attributed to the hemoglobin and myoglobin oxidation (Zhu et al. 2013; Sivertsen et al. 2011). The identified variables of 775 and 980 nm were possibly ascribed to the vibrations of chemical bonds of C–H, O–H, N–H and others shown in Fig. 3c.

Fig. 3
figure 3

Reference TVB-N level and spectral features of grass carp fish fillets. a the TVB-N level changes during cold storage days, b the spectra difference of acceptable and unacceptable TVB-N level, c the protein degradation products caused by enzymatic activity and microbial spoilage, and d the selected optimal wavelengths by PN-GA method

Prediction of TVB-N Value Using the Selected Wavelengths

After the use of GA and PN-GA algorithm, 12 and 6 critical wavelengths were selected from the whole spectral range with 381 wavelengths for establishing the simplified PLSR and LS-SVM models to predict TVB-N value and the corresponding prediction performance is presented in Table 2. Based on the GA analysis with the selected 12 optimal wavelengths, the established GA-PLSR and GA-LS-SVM models showed satisfactory performance for TVB-N prediction with R 2 P more than 0.916. Compared with GA-PLSR model analysis, (R 2 P  = 0.917 and RMSEP = 2.347 mg N/100 g), the GA-LS-SVM model presented better predictive accuracy with an increase of 0.006 for R 2 P and a decrease of 0.098 mg N/100 g for RMSEP. It is demonstrated that GA is useful to select the effective wavelengths for modeling TVB-N determination.

Table 2 TVB-N prediction performance based on PLSR and LS-SVM using selected wavelengths from fish hyperspectral images

However, it is still difficult to develop a simple and reliable multispectral imaging system for online application due to the currently selected more than ten important wavelength variables. Therefore, a novel algorithm of PN combined with GA defined PN-GA method was developed to filter the most informative wavelengths for rapid and online detection of TVB value for evaluation of protein degradation and freshness of grass carp fish fillets. As a result, six candidate wavelengths were selected and the established PN-GA-PLSR and PN-GA-LS-SVM models showed better prediction results than the GA-related models for measurement of TVB-N value shown in Table 3. In addition, Fig. 4 shows the predicted and measured TVB-N values for both PLSR and LS-SVM models. It can be noticed that the PN-GA-PLSR model showed slightly better performance for modeling than PN-GA-LS-SVM analysis with higher R 2 C of 0.968, R 2 P of 0.956 and lower RMSEC of 1.726 mg N/100 g and RMSEP of 1.737 mg N/100 g compared with the PN-GA-LS-SVM model with R 2 C of 0.965, R 2 P of 0.947 and RMSEC of 1.732 mg N/100 g and RMSEP of 1.846 mg N/100 g.

Table 3 TVB-N prediction performance based on PLSR and LS-SVM using combined variables from fish hyperspectral images
Fig. 4
figure 4

Predicted and measured TVB-N values for both PLSR (a) and LS-SVM (b) models

Prediction of TVB-N Value Using the Combined Variables

Hyperspectral image information also shows great contribution for indicating quality deterioration of fish muscle during cold storage. Thus, in this study, GLGCM algorithm was used to extract 13 image texture variables including energy, correlation, hybrid entropy, inertia, gray mean, grads mean, gray entropy, grads entropy, gray standard deviation, grads standard deviation, inverse difference moment, small and big grads dominance for fully reflecting the TVB-N changes of fish muscle and enhancing the robustness of model prediction. Table 3 shows the TVB-N prediction performance based on PLSR and LS-SVM using combined variables (the selected candidate wavelengths and image texture variables) from fish hyperspectral images. Based on the GA-related models, they showed similar prediction results with the models established only using the selected wavelengths (PLSR: R 2 P  = 0.917 and 0.927; LS-SVM: R 2 P  = 0.929 and 0.923) although the number of the variables was increased from 12 to 25. As to the PN-GA-related models, the number of input variables was increased to 394, and the combined variables were increased to 19, the relevant models performed superior results with R 2 P of more than 0.950. PN-GA-PLSR model using the combined variables exhibited the best prediction performance among all the established models with the highest R 2 P of 0.981 and the lowest RMSEP of 1.435 mg N/100 g. On the basis of this statistical analysis, it has been demonstrated that the image texture information to some extent has the positive influence on the prediction of TVB-N values. Combining spectral and image texture information for data fusion can be beneficial to improve the reliability and accuracy of the prediction models, which was also confirmed and reported in another work by Huang et al. (2014), who integrated the information of NIR spectroscopy, computer vision, and electronic nose techniques for measurement of TVB-N in pork meat and the used BP-ANN model yielded good result with R 2 P of 0.953 and RMSEP of 2.730 mg N/100 g. However, in a previous study, a simplified PLSR model using nine important predictive wavelengths (575, 600, 615, 705, 765, 825, 885, 915, and 935 nm) selected by analysis of the regression coefficients was developed for prediction of the TVB-N levels of pork meat (Wang et al. 2013). The results showed that the model evaluation indicator of R 2 CV was 0.890 and RMSECV was 1.940 mg N/100 g, which was less than the used model in this study. Similarly, Li et al. (2015) developed a novel and competent back propagation adaptive boosting (BP-AdaBoost) algorithm for data fusion and modeling of TVB-N content in pork meat. However, an inferior result was obtained with R 2 P of 0.869. It means that the prediction model can present different performance using different variable selection method. Moreover, the computation load was decreased from 381 to 6 and the working time was saved about 98.4 % in this study. Therefore, it is important to prove that the developed novel variable selection method of PN-GA in this study is highly effective and useful to screen the most valuable and the least correlation wavelengths for further analysis. More importantly, based on the selected key variables, it is magnificent and practical to develop a multispectral imaging system with higher reliability and accuracy for online detection purpose.

In a word, it has been confirmed that PN-GA algorithm was very suitable and highly effective for selecting the most informative wavelength variables and PN-GA-PLSR was considered as the best model combining the spectral and image texture information to predict the TVB-N values for evaluation of protein degradation and freshness loss of fish muscle.

Conclusions

The feasibility of using PN combined with GA algorithm to select the candidate variables from hyperspectral images for rapid and non-destructive determination of TVB-N level and monitoring of protein changes in grass carp fish fillet was investigated. The selected six optimal wavelengths (428, 550 nm, 601, 655, 775 and 980 nm) using PN-GA method were used to construct the PLSR and LS-SVM calibration model for predicting TVB-N value. The PN-GA-PLSR model showed the better prediction performance with R 2 P of 0.956 and RMSEP of 1.737 mg N/100 g. In addition, GLGCM algorithm was used to extract 13 image texture variables for enhancing the predicting results. A higher value of R 2 P of 0.981 and the lower value of RMSEP of 1.435 mg N/100 g were obtained using the PN-GA-PLSR model with the combined variables. The results showed that PN-GA algorithm is a very useful variable selection method to select the wanted wavelengths. Adding the image texture information is beneficial to improve the performance of TVB-N prediction. In the future, the reliable multispectral imaging system can be developed and used to determine the TVB-N level for evaluation of protein degradation and chemical changes in fish muscle.