Introduction

Correct identification of wood species is important since it is related to various properties and cost of materials. Examination of cross sections and surfaces of the material of interest is usually employed to identify wood species (Labati et al. 2009).

Because of excess exploitation, many forest tree species are on the official list of endangered plants in Brazil, including species such as Imbuia/Brazilian walnut (Ocotea porosa), Brazilian sassafras (Ocotea odorifera (Vellozo) Rohwer) and black cinnamon (Ocotea catharinensis Mez), along with other species of the same family (MMA 2008). Species with similar morphological and anatomical features are very difficult to classify (Banerjee et al. 2008), making it necessary to develop and apply fast techniques to obtain more information on the intrinsic characteristics of wood to enable reliable species identification, particularly when little information is available about anatomical or chemical composition. An alternative could be to use an artificial neural network (ANN) combined with near-infrared spectroscopy.

Artificial neural networks were inspired by biological neural networks and consist of many neurons that process information based on nonlinear and multivariable relationships between process parameters (Ozsahin 2012). Their application utilizes their ability to learn from prior examples and to perform generalized identification or recognition of previously unseen patterns. Their limitations depend on the quantity, validity and accuracy of training data (Clark 2003).

In species identification, taxa that are only doubtfully distinct from each other are difficult for any identification system to distinguish. Although a formal botanical key can be better, it often takes a long time to create. Furthermore, a higher level of expertise is required for construction of an effective identification system such as that demonstrated here using the neural network approach (Clark 2003).

For species identification, artificial neural networks have been shown to have good potential to identify species of fish (Simmonds et al. 1996), butterfly (Kang et al. 2012), mosquitoes (Banerjee et al. 2008; Lorenz et al. 2015), jujube tree pathogens (Zhang et al. 2013), wood (Esteban et al. 2009; Ma et al. 2012) and plants from leaf characteristics (Kattmah and Azim 2013). In wood technology, some examples of neural networks application are for classifying images of wood veneer (Packianather and Drake 2000), for identification of wood defects (Pham et al. 2006; Mu et al. 2015), for monitoring MDF milling (Zbiéc 2011), for predicting tensile index and brightness in pulp (Okan et al. 2015) and also for wood classification based on images characteristics (Sundaram et al. 2015).

In this context, some studies have applied infrared spectroscopy for species discrimination based on solid or powdered samples (Adedipe et al. 2008; Russ et al. 2009; Casale et al. 2010; Braga et al. 2011; Pastore et al. 2011; Sandak et al. 2011; Nisgoski et al. 2015). Studies of wood identification applying artificial neural networks combined with near-infrared spectroscopy are scarce. Ma et al. (2012) demonstrated the potential of combining these two techniques. The objective of this paper is to test an artificial neural network (ANN) in comparison with SIMCA classification to identify some Brazilian wood species based on near-infrared spectra.

Materials and methods

Wood samples of Ocotea porosa, Ocotea odorifera, Nectandra sp. (Lauraceae) and Eucalyptus sp. (Myrtaceae), with dimensions of 2 × 2 × 5 cm3, obtained from the collection of the Wood Anatomy and Quality Laboratory of Parana Federal University (UFPR) were used. Material was selected based on its availability in number of samples, species listed in endangered list (Ocotea porosa, O. odorifera), wood from Lauraceae family much similar with difficult discrimination (Nectandra sp.) and wood with no problem of commerce and visually and anatomically distant to other samples (Eucalyptus sp.) to test the potential of ANN and SIMCA in wood discrimination in different situations. Sixty (60) physical samples of each species, collected from different boards, were used without identification of age or position within the tree being known.

Near-infrared spectra were acquired using a Bruker Tensor 37 spectrometer (Bruker Optics, Ettlingen, Germany, www.bruker.com) equipped with an integrating sphere operating in reflectance mode. A total of 64 scans were acquired for each spectrum with a resolution of 4 cm−1 and a spectral range of 10,000–4000 cm−1, in other words, a set of 1500 wavenumbers. The samples were placed on top of integrating sphere, and four spectra were obtained from different points from each face; transversal, radial and tangential, resulting in a total of 12 separate spectra for each physical sample. All spectra were identified by sample name and used for further analysis without averaging.

The ANN used was developed with MATLAB (version 7.10 with Neural Network Toolbox 6, from MathWorks, Natick, MA, www.mathworks.com). The architecture employed consisted of an artificial neural network with one hidden layer, using back propagation learning with the Levenberg–Marquardt algorithm for training and the gradient descent method with moment to update the weights and biases. The performance index used was the mean square error (MSE) with the target criteria being less than or equal 10−10, and for the network output an error tolerance of ±2%.

The input layer was composed of an array of size Pij(1500 × 1) belonging to the set of samples, where its elements represent the absorbance for each of the 1500 wavenumbers acquired in each spectra. The input layer, the absorbance by wavenumber, belongs to interval [0, 1], due to the treatment given by the Fourier transform, a FTIR feature. All data were randomly presented to neural network (Fig. 1), and the goal is to distribute any possible noises and optimize the results. The network output was adjusted by a linear activation function to discrete intervals [1, 2, 3, 4], chosen for convenience. Each of four numbers refers to a type of wood, as shown in Table 1. Reference values indicate the desired output of the network for a given species, for example by analyzing the spectra of the samples relative to a wood, if the network provides a value close to 1, considering a tolerance of ±2%, then the species analyzed is probably Nectandra sp.

Fig. 1
figure 1

Representation of architecture applied to network

Table 1 Samples used for neural network analysis and SIMCA classification

As shown in Table 1, the initial model developed using ANN used approximately 60% of the available samples, and so for further analysis a subset of 70% of these samples was selected to create an ANN training set, leaving 20% for validation via cross validation and 10% for the test set. The 40% remaining samples were used exclusively for testing the fully trained ANN.

In search of the best model, the number of neurons in the hidden layer was alternated empirically from one to fifteen. For each choice of neuron number in the hidden layer, the network was trained repeatedly ten times in order to find the best result, based on randomly obtained weights and biases.

The Unscrambler X (version 10.1, from CAMO Software AS, Oslo, Norway, www.camo.com) multivariate analysis program was used to develop the SIMCA model. Exploratory modeling was done by analyzing the score and loading graphs obtained by principal component analysis (PCA), and SIMCA classification was performed with the sample numbers listed in Table 1.

For ANN and SIMCA, data were analyzed in raw form and also two pretreatment were tested: second derivative of Savitzky–Golay (polynomial order = 2, smoothing point = 3) and multiplicative scatter correction (MSC).

Results and discussion

The mean spectra of the four woods studied (Fig. 2) and the principal component analysis performed on the individual spectra (Fig. 3) show their similarity. Informative wavenumbers are correlated with the presence of polysaccharides, lipids and protein, which are related to cell structure and are resumed in Table 2.

Fig. 2
figure 2

Mean spectra of raw data from studied material

Fig. 3
figure 3

Principal component analysis of raw data

Table 2 Informative wavenumbers from wood spectra (Tsuchikawa and Siesler 2003; Yonenobu and Tsuchikawa 2003; Schwanninger et al. 2011)

In all species, along PC1 (Fig. 3), there is a distinction of two groups, due to spectral data being acquired from the transverse and radial–tangential faces. In Ocotea porosa it is much more evident and may be the result of wood anatomical characteristics, with more distinction of growth rings, frequency of vessels and cell oil. Spectra collected on the transverse section were more often correctly classified in SIMCA analysis than when spectra were collected from the radial–tangential face, and this result is similar to anatomical identification, where the transverse section provides the most important information in species identification. In the Lauraceae family, the distinction between Ocotea and Nectandra based on wood anatomy is difficult, and the characteristic odor from Ocotea porosa and Ocotea odorifera is important for correct information, but in some Nectandra species it is only the crystal type (raphides, prismatic or acicular crystals) and occurrence that is able to provide correct identification.

The original spectra and pretreatments using second derivative or multiplicative scatter correction (MSC) were analyzed by both ANN and SIMCA techniques. In the instance of the SIMCA models, the first two PCs and the number of training samples for each species in Table 1 were used. Individual models were based on the NIPALS algorithm and validated with leverage correction. In the ANN analysis, the number of neurons in the hidden layer and the training error are given in Table 3. One important detail is the number of neuron in hidden layer; small numbers indicate gain in speed of learning of ANN.

Table 3 Training error and number of neurons in the hidden layer in ANN analysis

A table was constructed after classification using either ANN or SIMCA with either the complete spectral range (Table 4) or divided into regions (Tables 5, 6, 7, 8, 9). In SIMCA classification, there was an overlapping classification, resulting in some samples not being uniquely classified, and in fact some individual samples may have been classified to two or more species. Each species therefore has both its right and wrong classified number of samples presented in the tables.

Table 4 Classification table for ANN and SIMCA for the complete spectral region (4000–10,000 cm−1)
Table 5 Classification table for ANN and SIMCA with band 1 (9995–8498 cm−1)
Table 6 Classification table for ANN and SIMCA with band 2 (8494–6997 cm−1)
Table 7 Classification table for ANN and SIMCA with band 3 (6993–5497 cm−1)
Table 8 Classification table for ANN and SIMCA with band 4 (5493–3996 cm−1)
Table 9 Classification table for ANN and SIMCA with band 2–4 (8494–3996 cm−1)

When the analysis was performed with the complete spectral range (Table 4), ANN was able to correctly classify all the samples across all wavenumbers within an error tolerance of ±2% when using raw spectral data and MSC pretreatment. In the case of the second derivative, a total of 24% of samples was not classified. SIMCA classification, using raw spectral data, performed best for Ocotea porosa and Ocotea odorifera, but many samples were incorrectly identified. In samples of Nectandra sp., the best result was observed after applying a MSC pretreatment, with a correct identification of 95.5% of samples. For Eucalyptus sp., the best result was with second derivative pretreatment resulting in a correct identification of 46% of samples. It is interesting that ANN has obtained very satisfactory results using the raw spectrum, while SIMCA requires prior treatment for best results.

When the analysis was performed using only wavelengths from 9995 to 8498 cm−1 (Table 5), ANN produced the best classification with MSC pretreatment, but this region in the near-infrared spectra does not provide relevant information about the species’ chemical composition, producing mostly noise, which can explain the high rate of unidentified samples in Ocotea porosa and O. odorifera. In the SIMCA classification, the best result is for Nectandra sp. with MSC pretreatment, with 95.1% of samples correctly classified. In this spectral region, in ANN 21% (MSC) to 65% (raw) of samples were not classified, and in SIMCA many samples were non-unique classified. The best results were obtained by applying the filter MSC, to both ANN and SIMCA. Although this spectrum band presents no great relevance, the MSC filter accentuated the most significant absorbance variations. The identification methods are sensitive to these variations, and it was noticed that the ANN showed to be more sensitive to the identification of these variations.

When the analysis was performed with wavelengths from 8494 to 6997 cm−1 (Table 6), some wavelengths associated with cellulose, lignin and extractives were selected and the ANN performed better. In this case, the best network performance occurred with data with a MSC pretreatment, and only 1.5% of samples were unclassified. For SIMCA, with raw data, there was a great deal of confusion between species, including Eucalyptus, a species from the Myrtaceae family, which is anatomically different to species of the Lauraceae family. In SIMCA classification, samples of Ocotea odorifera classified as Nectandra sp. was an interesting result, as spectra acquired from the transverse plane were not classified in this case. The best result was achieved using MSC for Nectandra sp., second derivate for Eucalyptus sp. and raw data for Ocotea porosa and Ocotea odorifera. In this spectrum band, the ANN was more sensitive to the absorbance variations compared to the band 1. It can be concluded that in this band there is more meaningful information for the distinction of the woods. It is difficult to identify which wavenumber is more significant for the distinction, because this information is processed within the black box of the ANN (Sussillo and Barak 2013).

When the analysis was performed with wavelengths from 6993 to 5497 cm−1 (Table 7), some wavelengths associated with water, cellulose, lignin and extractives were selected by the model and the ANN produced a better classification with MSC pretreatment for all species, with only 1.5% of samples not identified. The best result was due to the existence, in this spectrum range, of a greater amount of significant information for the distinction. However, it was still necessary to use a filter to highlight the variations of absorbance. For SIMCA, a significant confusion between species occurred, including a classification of some species from the Lauraceae family as Eucalyptus. Samples of Ocotea porosa classified as either Nectandra sp. and/or Eucalyptus sp. were from radial and tangential sections, indicating that some characteristics of the surface have influence on the spectra. The best result used MSC pretreatment for Nectandra sp., second derivative for Eucalyptus sp. and Ocotea porosa and raw data for Ocotea odorifera. Many samples were non-unique and not classified by SIMCA in this band.

When the analysis was performed with wavelengths from 5493 to 3996 cm−1 (Table 8), some wavelengths associated with water, cellulose, lignin and extractives were selected and the classification by ANN was similar for all spectra. Note that there is no significant difference between the results obtained with the data processed by applying a filter and those that were used in raw form. The results with raw data and pretreatment were similar apart from O. odorifera that performed better with MSC. There was little confusion in four samples of Ocotea odorifera using raw spectra, which is natural and can be explained by some irregularity on the surface. SIMCA classification produced substantial confusion for raw data. The spectra of samples of Ocotea porosa classified as Nectandra sp. were not from the transverse face. In this band, the best result for SIMCA was achieved with preprocessing data, with MSC for Nectandra sp. and second derivative for the other species.

When the analysis was performed with wavelengths from 8494 to 3996 cm−1 (Table 9), most of the information about wood composition was present and the classification by ANN was similar for all spectra and MSC pretreatment resulted in all samples being correctly classified. In the case of the SIMCA classification, substantial confusion was still observed and the best classification with raw data was for Ocotea porosa. The spectra of samples of Ocotea odorifera classified as Nectandra sp. were not from the transverse face. Pretreatment increased the classification performance, with the best pretreatment for Nectandra sp. being MSC and second derivative for other species.

In near-infrared analysis of solid material, a higher number of samples are indicated because surface, shape and particle size can influence the results in discrimination of species (Brunner et al. 1996; Hein et al. 2010; Nisgoski et al. 2015). Literature reports on the efficient use of pretreatment with second derivative in wood discrimination (Sandak et al. 2011; Zhang et al. 2014; Horikawa et al. 2015; Hwang et al. 2016; Muñiz et al. 2016) and also the division of spectral range region from 4249 to 6100 cm−1 showed the distinction of wood species similar to mahogany (Pastore et al. 2011), from 4000 to 6200 cm−1 presented adequate results in discriminating six provenances of Sugi in south Brazil (Nisgoski et al. 2016) and 4000–5000 cm−1 plus 5500–6200 cm resulted in separation of wood and charcoal from “Angelim” Brazilian group (Muñiz et al. 2016).

Ma et al. (2012) present correct identification rates of 97–99% of wood samples with ANN and near-infrared spectra. Other studies on biological material discrimination with infrared showed the potential of ANN, but in mid-region (400–4000 cm−1) and powder samples. In Fusarium species identification, Nie et al. (2007) trained ANN with first ten principal component scores from second derivative of FTIR spectra in each input layer and obtained a R 2 of 0.99. Further, in this study only PCA analysis failed because the main variations in the spectra were not related to variation between species. For Ephedra species discrimination, based on near-infrared spectra of powder samples, ANN presented 95% of prediction accuracy. When the analysis evaluated different habitats of E. sinica and samples collected at different times of day, ANN reached 100 and 93.3% of prediction accuracy, respectively (Fan et al. 2010).

Division of mid-spectra in different regions with more influence in species discrimination is also present. For Campylobacter species identification, Mouwen et al. (2006) obtained adequate results with a four-layer ANN for discrimination of genotypes. Also spectral windows between different wavenumbers in mid-infrared were applied to input layer for Listeria species discrimination (Rebuffo et al. 2006), and results showed 96% of accuracy in differentiation.

ANN also presented potential for application to species distinction based on anatomical features of wood (Esteban et al. 2009) and leaf images (Pandolfi et al. 2009).

Cell dimensions in input layer of a network resulted in 92% of probability of differentiation of two Juniperus species (Esteban et al. 2009). Based on leaf images of 17 accessions of Camellia sinensis, with different origin and varieties, Pandolfi et al. (2009) showed the potential of ANN and commented that the limitations are the same as of a human expert, which involves number and accuracy of training data: better train and learn when data present rich variation.

On the other hand, in SIMCA, misclassifications occurred with a greater frequency. It can be the result of the point (cell type) where spectra were acquired or irregularities on wood surface, since the samples were only sawn. Some examples of adequate results from SIMCA classification are in studies with red and white oak (Adedipe et al. 2008) and thermally modified wood of spruce, beech and ash (Bächle et al. 2012).

Even though a few numbers of species were studied, the objective of the study was achieved, ANN presented potential for species discrimination based on NIR spectra of solid wood specimens and can be a rapid and effective tool in forest commerce.

Conclusion

SIMCA classification is sensitive to surface orientation of the samples, resulting in considerable non-unique and non-classification, and is not recommended for discrimination of Ocotea porosa, Ocotea odorifera, Nectandra sp. and Eucalyptus sp. species. MSC and second derivative pretreatment had a good influence on SIMCA classification. The disadvantage of the use of filters is due to the fact that their choice is generally empirical, making the classification process of species slower. In this aspect, ANN becomes more advantageous due to its large sensibility perception of changes in absorbance between species without the need for prior treatment data. Considering the large amount of data and low linearity between them, the ANN demonstrated good potential for species discrimination based on near-infrared spectral analysis. In this study, the best performance came from an artificial neural network using the tan–sigmoid transfer function and eight neurons in the hidden layer. For ANN analysis, the use of the spectral range from 4000 to 10,000 cm−1 is recommended. In this case, the neural network developed did not result in any identification error for a margin of ±2%.