1 Introduction

Regarded as one kind of important material of manufacture and fuel, wood is used by human over a long period of time [1,2,3]. The physical and chemical properties of different wood species are variant, since different kinds of wood contain distinguished compositions [4, 5]. These properties will affect the costs, implementations, and cycle period of wooden production in practice [6]. In the field of architecture manufacturing, archeology, biodiversity, and environmental conservation, the wood species is essential information [7,8,9,10]. Therefore, classification of wood species is a very important research.

The traditional classification method is to recognize wood species by observing visible anatomy features of wood sections with microscope or macroscopic [11]. The accuracy of the traditional method depends on the operator’s experience and attention [12]. To solve this problem, researchers proposed some improvements based on the traditional methods. The machine vision technology with computer digital image algorithm, such as gray-level co-occurrence matrices (GLCM) and edge detection technique, can improve the accuracy and speed of analysis methods [7, 13, 14]. Another kind of imaging method is based on multispectral imaging technique [15]. The analysis accuracy of this method is high due to more information of multispectral image. However, the imaging method is not suitable for analysis of in-suit application, because it requires sample preparation and microscope image of wood sections. Besides the methods based on imaging process, DNA extraction technique was also used in wood sample classification [16, 17]. This method is more accurate; however, DNA extraction requires high precision laboratory environment and complicated sample preparation. Expect the techniques mentioned above, near-infrared spectroscopy (NIRS), one kind of non-destructive spectroscopy analysis method, was used to analyze wood and paper [18, 19]. However, the NIRS method also requires sample preparation, which increases the complexity of the measurement [20]. In addition, the peaks of different wood chemical composition are overlap and weak in the near-infrared range and the NIRS is susceptible interfered by environmental factors [21]. The literatures mentioned above show that some techniques have been applied in wood species classification, but still have some disadvantages.

As well known, compared with the other material analysis technology, laser-induced breakdown spectroscopy (LIBS), an atomic spectroscopy analysis technology, has some unique advantages such as rapid analysis, few sample preparation, in-suit, and remote measurement, etc. [22, 23]. Therefore, LIBS has been widely applied in the field of material classification and identification such as metallurgy, exploration and safety, etc. [24,25,26]. Some researchers have analyzed the wood samples using LIBS. For example, wood-slice substrate was selected as a water absorber for detection of trace amounts of heavy metal in aqueous solutions using LIBS [27, 28]. Regarded as waste materials, wood samples were analyzed by LIBS to detect the elements information of contaminant, such as chemical warfare agents (CWA) and preservatives [29,30,31]. However, these studies, in which the wood samples were used as substrate materials, are not for wood classification. In this paper, the aim of the research is to classify the wood species by LIBS.

In the field of recognition analysis, supervised learning, one kind of machine learning methods, is combined with the LIBS technology to discriminate materials frequently. In supervised learning method, the data of samples with label are used to build a classification model and the data of unknown samples are identified by imported into the classification model. The generalization performance of model is evaluated by the correct classification rate (CCR) for test set [32], which is the ratio of the number of samples classification correctly to the number of all unknown samples. The common supervised learning methods include artificial neural network (ANN), partial least-squares discriminant analysis (PLS-DA), and soft independent modeling of class analogy (SIMCA) [33], etc. The previous research works demonstrated that these multivariate analysis techniques could be applied in LIBS spectra classification [34]. PLS-DA maximizes the inter-class variance while minimizing the intra-class variance, which raises the recognition ability of the model [35]. However, PLS-DA is a binary classification model and the complexity of PLS-DA model will increase for multiclassification task [36]. For the SIMCA, principal components analysis (PCA) is developed on each sample class, and the unknown samples are classified by evaluating the distance between the unknown sample and the center of each class. In some applications, the classification results of SIMCA for some samples were correct, while other samples were recognized incorrectly [37]. ANN, as a prevalent model used in pattern recognition, the accuracy, and repetition of ANN, has demonstrated improvement in the discrimination capability compared to the other models [38]. Considering the advantages and disadvantages of these techniques, ultimately, we selected the ANN model to classify the spectra data of wood.

In this paper, the spectra of four species of wood were measured. The feature spectral data of wood selected by PCA were input into ANN model. The relationship between the parameters setting and the CCRs of ANN models was investigated in detail. In addition, the CCRs of ANN, PLS-DA, KNN, and SIMCA classification models combined with LIBS data in wood classification were compared.

2 Materials and methods

2.1 Experimental setup

The experimental setup is same as the LIBS system used in the previous work [34]. A schematic of the system is shown in Fig. 1 for reference. A Q-switch flash lamp pumped Nd:YAG laser operating at 1064 nm with 1 Hz repetition rate, pulse duration of 10 ns, beam diameter of ∅6 mm, and single pulse energy of 50 mJ was used as the excitation source. The laser beam was guided through three reflectors (M1, M2, and M3) and focused onto sample surface by a convex lens (L1) with focal length 100 mm. The wood sample was put on a 3D translation stage. The distance between the convex lens and sample surface was adjusted by z-axis translation device of the 3D translation stage. The diameter of spot size was less than ∅ 1 mm. The sample was moved by x- and y-axis translation device of the 3D translation stage to ensure that each laser pulse ablates a fresh location. The laser was preheated for half an hour before measurement to stabilize the laser energy.

Fig. 1
figure 1

LIBS experimental setup for wood species analysis

Plasma emission generated by ablated sample was focused by another convex lens (L2) with 30 mm focal length and ∅ 25 mm diameter into a fiber optic bundle (two-fiber, each of 600 μm aperture). The angle between the normal direction of fiber bundle and incidence direction of laser was 45° to ensure spectrometer collected plasma emission stably. The fiber bundle connected with a two-channel gated charge-coupled device (CCD) spectrometer (AvaSpec 2048-2-USB2, Avantes). The spectral interval range of the spectrometer covers from 200 to 950 nm with a convertible spectral resolution of 0.20–0.30 nm. The acquisition parameters of spectrometer were set by software (AvaSpec 7.6 provided by AVANTES). In this experiment, the flash lamp was triggered by spectrometer. The delay time, which is the time interval between the flash lamp trigger and the spectrometer acquisition, was 376 μs and the integration time of CCD was 2 ms. The experiment was implemented under ambient condition.

2.2 Materials

The wood samples were purchased at local furniture market in China. The latin names of wood samples (Africa rosewood, Brazil bubinga, Myanmar padauk, and Pterocarpus erinaceus, respectively) and the corresponding sample ID listed in Table 1.

Table 1 Latin names and sample ID of wood samples

The samples are shown in Fig. 2. The hardness of four species of wood is similar to each other. All wood samples were manufactured into 80 mm × 140 mm × 5 mm block and cleaned with alcohol. For each type of wood, the sample was divided into 100 sub-samples and 3 laser shots ablated on different positions in each sub-sample using a 3D translation stage (Fig. 1). Then, these three spectra were averaged into one spectrum. As the samples were naturally heterogeneous, each sub-sample can be regarded as one sample (Fig. 2b). Ultimately, the spectra of 100 samples were obtained for each type of wood (400 samples in total). From each type of sample, 70 spectra were selected randomly as training set (totally 280 spectra) and the rest spectra were used as test set. The training set was imported into a trainer of ANN to build a classification model. The test set was used to assess the performance (i.e., recognition ability) of model for unknown spectra.

Fig. 2
figure 2

a The surface of experimental samples and b a method to improve the number of samples. The red dots were laser shot in each sub-sample

2.3 Spectral data

In the process of model building, we can use pixel intensity (the intensity of each single wavelength) or line intensity (the area of the single spectral line) as the input variable. Therefore, it is worth to investigate which type of input variable is suitable for wood classification. Figure 3 describes the difference between pixel intensity and line intensity (using peak of H element as an example here). In this paper, the recognition ability of ANN models established on pixel intensity and line intensity was compared. In the case using pixel intensity as input variable, the intensities of 3466 pixels, the all pixels of a whole LIBS spectrum in our case, were selected as spectral data. In the case using line intensity as input variable, the intensities of 188 spectral lines, which were selected from spectral lines of all wood types, were used as spectral data for analysis. It is worth mentioning that the data analysis procedures for these two different types of input variables were the same.

Fig. 3
figure 3

The difference between pixel intensity and line intensity. Using H element as an example, the pixel intensity is the intensity of a single wavelength, such as 655.90 nm or 655.13 nm. The line intensity is the area of the spectral line of H element from 649.96 to 664.14 nm

3 Data analysis

The data analysis procedures used in this work include feature spectral data selecting, data pre-processing, and classification modeling. A method based on PCA model was used to select the features, normalization (using the sum of all intensities to normalize the spectral data) was selected as pre-processing method, and ANN was used as classification model. Feature spectral data selecting and data pre-processing were implemented based on MATLAB version 2016a (MathWorks, Natick, MA). The modeling process and parameters optimization of ANN models were investigated using Automated Neural Network Toolbox, which was integrated in STATISTICA (version 10.0.1011.0, StatSoft, Inc.).

3.1 Feature spectral data selection

To improve the performance and interpretability of the model, the spectral feature data should be extracted [39]. PCA, a statistical method of dimensionality reduction [40], was used to extract feature variables of LIBS spectral data in this work. PCA is one kind of orthogonal transformation method. The high-dimensional data are projected down into a lower dimensional subspace, of which the variance of projection in each direction is maximum value. Therefore, the scores of principal components (PCs) contain the information of the high-dimensional data in the new subspace as much as possible. An important result of PCA algorithm is the eigenvector of the covariance matrix of high-dimensional data, namely loadings. The 1/n of the maximum absolute value of loadings for PCx (x = 1, 2) was regarded as the threshold to select the feature variables. That is, the spectral pixel or line whose absolute value of loading was above the threshold value was selected as the feature variable data. The optimal value of n was searched based on classification results. The method can be referenced in the literature [41].

3.2 Data pre-processing

Normally, there are some spectral fluctuations between the measurement of each pulse due to inhomogeneity of sample surface, interference of ambient, and fluctuation of laser energy. Suitable data pre-processing method can reduce those data fluctuations [42]. Referenced the previous studies about pre-processing methods [43,44,45], normalization method was selected as the pre-processing method ultimately:

$$x_{{ij}}^{*}=\frac{{{x_{ij}}}}{{\sum\nolimits_{{j=1}}^{n} {{x_{ij}}} }},$$
(1)

where xij is the original data in the ith row and jth column of spectral matrix, and x*ij is the data after normalization in the ith row and jth column of spectral matrix.

3.3 Classification modeling

In the present work, we used the artificial neural network (ANN) model as the classification model. Feed-forward networks were used as ANN networks due to their excellent ability of self-learning and self-adaptive. There are two popular and principal network types of feed-forward networks, multilayer perceptron (MLP), and radial basis function (RBF).

The basic structure of MLP is a three-layer network, as shown in Fig. 4, which includes input layer, hidden layer, and output layer. The number of input layer neurons, n, equals to the number of input variables. The number of hidden neurons, l, may affect the performance of ANN and should be set based on specific application. The number of output-layer neurons was one. The different outputs yo of the neuron represent different prediction classes.

Fig. 4
figure 4

Network structure of ANN model

Every neuron in hidden layer and output layer represents an activation function, which is often a nonlinear function. The output of jth neuron in hidden layer is calculated by the following:

$$y_{j}^{{\text{H}}}={f_{\text{H}}}\left( {\sum\limits_{{i=1}}^{n} {\omega _{{ij}}^{{\text{H}}}} \cdot {x_i} \pm {b_{\text{H}}}} \right),\quad j=1,2, \ldots ,l,$$
(2)

where fH and bH are activation function and bias of hidden layer, respectively. ωijH (i = 1, 2, …, n; j = 1, 2, …, l) is the connection weight between the ith neuron in input layer and jth neuron in hidden layer; xi (i = 1, 2, …, n) is the ith input variable. The output of the model is determined as follows:

$${y^{\text{O}}}={f_{\text{O}}}\left( {\sum\limits_{{j=1}}^{l} {\omega _{j}^{{\text{O}}}} \cdot y_{j}^{{\text{H}}} \pm {b_{\text{O}}}} \right),$$
(3)

where fO and bO are activation function and bias of output layer, respectively. ωjO (j = 1, 2, …, l) is the connection weight between the jth neuron in hidden layer and the neuron in output layer.

Iterative algorithm is one factor related to recognition ability of MLP network. Iterative algorithm is used to determine the connection weights, which decide the input of neuron in next layer. The optimal connection weights searched by different iterative algorithms are different, which lead to the difference of model recognition ability. The prevalent and basic iterative algorithms for training neural networks include the Broyden–Fletcher–Goldfarb–Shanno (BFGS), conjugate gradient and gradient descent algorithm. It is necessary to determine the suitable iterative algorithm of MLP network for a specific application.

Another type of feed-forward networks is RBF whose structure is similar to MLP. The difference between RBF and MLP is that the activation function of hidden layer in RBF network is a radial basis function, which is a non-negative nonlinear attenuated function and radial symmetric around the prototype vector. The radial basis function is a Gauss function often. For the RBF network, the prototype vector of each hidden layer neuron is determined by input data directly. Therefore, there is no necessary to optimize the prototype vector. In this paper, the classification ability of the ANN models based on MLP and RBF was compared.

4 Results and discussion

4.1 Spectra of wood samples

The LIBS spectra of four species of wood samples are shown in Fig. 5.

Fig. 5
figure 5

Spectra of four species of wood samples

Emission lines from Ca (392.788 nm, 396.47 nm, 422.41 nm, 445.11 nm, 611.1 nm, and 615.2 nm), Na (588.95 nm and 589.55 nm), O (777.3 nm), H (656.1 nm), CN molecular violet bands (386–389 nm), and N (742.364 nm, 744.306 nm and 766.523 nm) can be observed in the spectra of all four species of wood samples. However, the intensities of those spectral peaks are clearly different in the spectra of the four wood species. For example, the intensity of O (777.21 nm) in the spectrum of Africa rosewood and the intensity of Fe (392.59 nm) in spectrum of Brazil bubinga are weaker compared to the other wood species. The ratio of O to Fe (O/Fe) in spectrum of Myanmar padauk is 1.73 and the value of O/Fe in Pterocarpus erinaceus is 0.90, approximately. It is worth noting that the intensities of such spectral data and the ratio between them are key factors in distinguishing the wood specie.

4.2 Feature data selection

We performed the PCA model with the training set (280 spectra). The 1/n of the maximum absolute value of loadings for PCx (x = 1, 2) was regarded as the threshold to select the feature variables as the inputs of the ANN model. The CCRs of models established using pixel intensity and line intensity as input variables for test set are shown in Fig. 6 when n was selected from 1 to 10. The results demonstrated that CCRs achieved maximum when n was set to be 4 for the both cases of using pixel intensity (99.17%) and line intensity (100%) as input variables. For line intensity, the CCRs were all 100% when n was more than 4. However, more input variables will increase the complexity of model. Therefore, the optimal value of n was set to be 4 for both cases.

Fig. 6
figure 6

The CCRs of ANN model when n was set to be 1–10

In the case of using pixel intensity as input variables, the loadings of the first two PCs are shown in Fig. 7. The dotted lines in the figure represented the thresholds on each principal component (0.07808 and 0.05044 on PC1 and PC2, respectively) and the element symbols were attached on the side of pixels whose loadings were exceeded the thresholds. As a result, we selected 49 pixels that belong to 17 elements listed in Table 2 as features.

Fig. 7
figure 7

Loadings of PCx (x = 1, 2) in the case of using pixel intensity as input data, the thresholds values of feature spectral data selection were 0.07808 and 0.05044 on PC1 and PC2, respectively

Table 2 The elements and wavelengths of features selected by PCA

In the case of using line intensity as input variables, the loadings of the first two PCs are shown in Fig. 8. The dotted lines in the figure represented the thresholds on each principal component (0.16495 and 0.10265 on PC1 and PC2, respectively) and the element symbols were attached on the side of lines whose loadings were exceeded the thresholds. Consequently, 21 lines were selected from 188 spectral lines as features and listed in Table 2.

Fig. 8
figure 8

Loadings of PCx (x = 1, 2) in the case of using line intensity as input variables; the thresholds values of feature spectral data selection were 0.16495 and 0.10264 on PC1 and PC2, respectively

The ratios between the intensities of features can be used to classify the wood species. We use Bayesian statistics to determine the classification thresholds. The ratio corresponding to the maximum CCR in training set was selected to classify the test set (as shown in Fig. 9, the red lines represent the thresholds). In the case of using pixel intensity as input data, Fig. 9a, b, using the ratio between O 777.21 nm and Na 588.95 nm can obtain the best result. The CCRs for training set and test set were 97.86% and 96.67%, respectively. In the case of using line intensity as input data, Fig. 9c, d, the maximum CCR can be got when using ratio between N 821.63 nm and Na 588.95 nm. The CCRs for training set and test set were 95% and 94.17%, respectively. The experimental results indicated that the maximum CCR could not achieve 100% when using a ratio between the intensities of features to classify the species of wood. Therefore, it is necessary to investigate using multivariate analysis methods to classify the wood species, such as ANN model in our case.

Fig. 9
figure 9

The thresholds for recognizing the types of wood species and classification result in the case of using pixel intensity as input data (a, b), and in the case of using line intensity as input data (c, d), respectively

4.3 Optimization of neural network type and structure

We established the ANN models based on MLP and RBF network integrated in Automated Neural Network Toolbox, STATISTICA. Considering for parameter setting rules of ANN model, the number of input neurons and output neurons should be equal to the number of input variables and output variables, respectively. The number of hidden layer neurons, l, is calculated according to the formula:

$$l=\sqrt {n+m} +\alpha ,$$
(4)

where n is the number of neurons in input layer; m is the number of class; α is a constant, which can be selected from 1 to 10. For a certain number of input- and output-layer neurons, the number of hidden layer neurons will be chosen in a closed numerical interval. Hence, we investigated the relationship between the number of hidden layer neurons and recognition ability of model. In the case using intensity of pixel as input variable, the number of input layer neurons of ANN models and the number of output classes were 49 and 4, respectively. In the case using intensity of line as input variable, the number of input layer neurons of ANN models and the number of output classes were 21 and 4, respectively. According to Eq. (4), the number of hidden layer neurons was changed from 8 to 18 for pixel intensity and from 6 to 15 for line intensity, respectively.

For a certain number of hidden layer neurons, the activation function of the hidden layer and output layer and other parameters of ANN model were searched by STATISTICA automatically. It is necessary to emphasize that the CCRs mentioned in this section were all for test set.

4.3.1 Optimization of iterative algorithms of MLP network

The CCRs of different models based on MLP network with different number of hidden layer neurons optimized by BFGS, conjugate gradient, and gradient descent iterative algorithms are shown in Fig. 10 and listed in Table 3.

Fig. 10
figure 10

CCRs of ANN model generated by different iterative algorithms using a pixel intensity and b line intensity as input variable

Table 3 The CCRs of model optimized by different iterative algorithms

The CCRs of models optimized by BFGS and conjugate gradient were all above 97% and the recognition ability of model optimized by BFGS was better than the model established by conjugate gradient. The CCR of model optimized by gradient descent was the lowest, only 60.23%. For BFGS and conjugate gradient, the number of hidden layer neurons had a little influence on the recognition ability of model and the CCRs of model were from 97.50 to 100%. For gradient descent, the CCRs of model were from 25 to 95% for different number of hidden layer neurons. For BFGS and conjugate gradient, the averaged CCR of model established using line intensity (100%) was higher than that of using pixel intensity (99.32%) as input variable with less inputs (21 for line intensity while 49 for pixel intensity). For gradient descent, however, the mean value of CCR of model established using pixel intensity (64.76%) was higher than that of using line intensity (55.25%) as input variable, but both CCRs of them were lower than 65%.

The different recognition ability of models optimized by three iterative algorithms is owing to the difference of the optimal weights calculated by different iterative algorithms. The search direction of gradient descent and conjugate gradient algorithm is the present gradient of loss function. Conjugate gradient algorithm analyzes the present and previous iteration gradient information of the loss function in the optimization process. As a result, the iteration accuracy of conjugate gradient algorithm is higher than gradient descent algorithm. BFGS algorithm analyzes the two-order derivative information of loss function and obtains the optimal weights directly. Hence, the performance of BFGS is the best in these three algorithms.

4.3.2 Optimization of network type

The CCRs of ANN models based on two different types of network, RBF and MLP (iterated by BFGS algorithm), with different number of neurons in hidden layer are shown in Fig. 11, and the statistical results of CCRs of model established by different network type are listed in Table 4.

Fig. 11
figure 11

a The CCRs of two models with different network types using pixel intensity as input data and b the CCRs of two different network types’ model line intensity as inputs with different number of hidden layer neurons

Table 4 The CCR of model established based on different networks

The results show that the CCRs of models established based on MLP network were higher than that of RBF network. The averages of CCRs were 99.64% and 95.95% for MLP and RBF, respectively. In addition, the fluctuation of the CCRs of model based on RBF was higher than that of MLP. The CCRs were from 99.17 to 100% for MLP and from 88.3 to 99.17% for RBF, respectively.

The difference of CCRs for MLP and RBF models may be due to the different propagation from input layer to hidden layer. For RBF network, spectral data are propagated from input layer to hidden layer by calculating the Euclidean distance between the input variables and the prototype vector of each hidden layer neuron. For MLP, however, the spectral data in input layer are propagated into the hidden layer by multiplying the weight coefficients, which are optimized automatically through iterative algorithm. Compared to the propagation of RBF, the propagation of MLP can retain the specificity between different types of wood samples better. Therefore, the model established based on MLP network may be more suitable for the classification of LIBS spectra compared to RBF in this work.

In the case using intensity of pixel as input variable, the CCRs of models based on MLP fluctuated from 99.17 to 100% and the average was 99.32%. For RBF, the CCRs were from 88.3 to 97.5% and the average was 94.01%. In the case using intensity of line as input variable, the CCRs were 100% for all number of hidden layer neurons for MLP. For RBF, the CCR achieved the maximum value, 99.17%, when the number of hidden layer neurons was 8 or 11 and the average was 98.08%. The results showed that using line intensity as the input variable may be better than using pixel intensity as the input variable for ANN classification model.

Moreover, the experimental results show that there was no significant correlation between CCR and the number of hidden neurons in our case. However, the number of hidden layer neuron is directly proportional to the training time of model. Therefore, the number of hidden layer neurons could be set as 6 considering for reducing the time complexity.

In addition, the classification ability among ANN model with the other well-known chemometric methods (PLS-DA, KNN, and SIMCA) was also evaluated. The optimized CCRs of different models for test set are shown in Fig. 12. The optimal CCRs of ANN, KNN, PLS-DA, and SIMCA are 100%, 95.83%, 82.5%, and 51.67%, respectively. The results demonstrated that the recognition ability of ANN model was the best among KNN, PLS-DA, and SIMCA.

Fig. 12
figure 12

The CCRs of ANN, KNN, PLS-DA, and SIMCA

5 Conclusions

A novel method used to classify the species of wood samples based on ANN model combined with the LIBS spectral data was introduced in this paper. The spectral data (pixel and line, respectively) whose absolute value of loading was above the 1/4 of the maximum absolute value of loadings were selected as the feature. The spectral data were normalized by sum of all the feature spectral data.

The CCRs of ANN models established based on different network types with different number of hidden layer neurons and iterative algorithms were compared. The experimental results show that the most suitable network type and iterative algorithm of ANN model were MLP and BFGS, respectively, and there is no significant correlation between the recognition ability and number of hidden neurons.

We also compared the CCRs of models established using pixel intensity and line intensity as input variable for test set, respectively. The results showed that the recognition ability of model established using line intensity as input variable is better than using pixel intensity as input variable.

The optimal CCRs of ANN and other well-known chemometric models (PLS-DA, KNN and SIMCA) for test set were also evaluated. Only the CCR of the ANN model achieved 100%, which demonstrated that the successful application of LIBS combined with ANN in recognition of wood species. In our future research, LIBS will be applied to analyzing and investigating the original region of wood.