1 Introduction

During industrial production of wood, prior classification of the species is of great significance. The classification of wood species can be divided into the following broad categories. The first kind of method uses the anatomical characteristics of the wood species (Da Silva et al. 2017), which although providing a higher classification accuracy, requires the wood to be sliced to observe its anatomy, has a complex sample preparation procedure, and is not amenable to nondestructive testing. The second kind of method uses the genetic information of the wood to identify the wood species (Yu et al. 2017; Jiao et al. 2019). This method is often applied to the identification of endangered and ancient wood species. However, it also has the same problems that are encountered during anatomical testing and is not suitable for mass classification of wood species. The third kind of method is to use stress waves to classify wood (Rojas et al. 2011). This kind of method is novel, but it requires quiet environmental conditions of data collection without noises. The fourth kind of method uses the macroscopic characteristics of wood for classification. Macroscopic features refer to the textural, spectral, macroscopic structural (pores, wood rays, etc.), odor-related, and other features of wood. This is a very simple and convenient wood classification method as such features are usually easy to obtain and have a strong ability to describe the samples.

When classifying wood through its macroscopic features, any section, whether transverse, radial, or tangential, can be used. However, certain scholars have noted that transverse sections are better suited in research to obtain wood classification and identification (Barmpoutis et al. 2018). Recently, several scholars have used the macroscopic characteristics of the transverse sections to classify wood species. Rosli et al. (2019) trained the texture feature of tropical wood transverse sections using a gray level co-occurrence matrix (GLCM) and back propagation (BP) neural network, and the method identified tropical tree species in less than one second. Zamri et al. (2016) extracted the textural features of transverse sections using the improved basic gray level aura matrix (I-BGLAM), compared them with those obtained with GLCM, and achieved a final classification accuracy of 97.01%. There are numerous ways to classify images using texture features. Armi and Fekri-Ershad (2019) pointed out that most extraction methods which use texture features have rotation invariance; nonetheless, such features are also extremely sensitive to noise, so that the classification of wood by using only texture features has inevitable defects. In addition to textural features, the statistical features of the pores on the wood transverse section can also be used to classify the wood species. Zamri et al. (2016) used a fuzzy statistical analysis of the pores in wood transverse sections to classify 52 tropical wood species. Experimental results showed a significantly higher accuracy using this method than that of the GLCM texture feature recognition method. Ibrahim et al. (2018), a part of the same research team, further improved the identification accuracy of wood species by using fuzzy classification of the statistical features of the pores in the wood transverse sections and the texture classification with BGLAM. However, the above mentioned two methods have certain problems: first, the pore size of wood is obscure, and the clear pores of all hardwood species on the macroscopic scale cannot be captured; second, the transverse sections of several wood species have similar statistical features; third, this type of method cannot be used for classifying mixed samples of hardwood and coniferous wood. Deep learning has also been widely applied to the identification of wood species recently. De Geus et al. (2021) used a deep learning model based on transfer learning to identify images of wood cross-sections. Hu et al. (2019) used Densenet to identify the types and defects of wood. The use of spectral features to classify wood is also feasible. Pozhidaev et al. (2019) used near-infrared spectroscopy to identify archaeological wood samples. However, spectral classification requires a very stable data acquisition environment and high data acquisition cost. Meanwhile, the spectral feature also undergoes changes due to impurities, colors, and other components on the wood surface.

There are many studies which have shown that considering multiple features can provide a better performance than using any single feature alone (Chen et al. 2016; Zhang et al. 2019). Presently, the data type that can comprehensively describe the macroscopic characteristics of wood transverse sections is hyperspectral data, which simultaneously takes into account both the image and spectral information of the wood. However, owing to the large amount of hyperspectral data and the slow data acquisition speed, it is still impossible to realize wood classification and recognition on a large scale. Therefore, a 1-dimensional spectrometer and digital camera were used in this study to collect the spectral and image information, respectively, from the wood transverse section, which can reduce the cost of data acquisition and storage and relatively accelerate the classification and identification speed.

The research purpose of this paper is to use the spectral information and image information of wood transverse sections for wood species recognition, since the fused feature may have more abundant and complimentary classification information. Therefore, first it was considered how to extract effective feature vectors through the spectral information and image information of wood transverse sections, and then how to fuse the two kinds of features to form a more representative feature vector, and finally verify the effect of the feature fusion on wood classification by experimental comparisons.

2 Materials and data acquisition

For the experiments, 50 wood species samples were used in this study (as described in Table 1). The samples were mainly from the Beijing Panzhuang and the Shanghai Furen global timber markets. The research team procured the samples in batches to prevent them from being from the same tree. The wood sample set included not only hardwood and coniferous wood, but also wood samples of the same genus (e.g., wood species 23, 24, 25 in Table 1 are biologically similar species which belong to the same genus).

Table 1 Wood species information

More than 25 logs for each tree species were prepared. It is important to use logs from the tree trunk, irrespective of the exact size and location. The logs were then cut into small pieces measuring around \(2\times 2\times 3\text{c}\text{m}\) and the size of the wood transverse section was retained at \(2\times 2\text{c}\text{m}\). The wood samples were additionally subjected to polishing and cleaning intended to highlight the anatomical features of the wood. Two small pieces of samples were randomly selected from each log and 50 samples were selected from each tree species, and the sample number of the data set is 2500.

Figure 1 shows the spectrum and image acquisition equipment. The spectrum capturing equipment comprised an Optics USB2000-VIS–NIR miniature optical fiber spectrometer, light source, and acquisition software. The image acquisition equipment comprised a charge-coupled device (CCD) lens, optical microscope, light emitting diode (LED) light source, and acquisition software. The magnification range of the optical microscope is 5–100 times, LED light source is white light, and CCD lens supports a maximum resolution of 1920 × 1080 pixels.

Fig. 1
figure 1

Data acquisition system a. Spectral acquisition system b. Image acquisition system

When collecting information from the wood transverse section, data collection was performed on two acquisition platforms in sequence. To prevent confusion between samples, it is necessary to label the wood samples and pay attention to both sides of the samples. The RGB images and spectral reflectance curves of the 50 wood species in Table 1 are given in Appendix 1 and Appendix 2 in Supplementary Information, respectively. It is worth mentioning that the RGB images of some tree species out of the total 50 are visually very similar (for example, tree species 4 and 5 and tree species 18 and 33), which are difficult to distinguish with naked eye observation.

3 Theory and method

3.1 Use of fractal theory to extract spectral features

The fractal-dimension geometry describes the data through self-similarity, and the fractal dimension is used to extract the features of the spectrum with strong stability. To describe the spectral characteristics comprehensively, the spectrum must be segmented (Mukherjee et al. 2013). Thus, multiple fractal dimensions are obtained. Assuming \(P\) is the step size, \(W\) is window size, \(\boldsymbol{S}\) is spectral data, and \({N}_{r}\) represents the length of the spectral curve, the number of spectral segments \({N}_{f}\) is calculated by Eq. (1) (Liu et al. 2016). \({N}_{f}\) can also be defined as the characteristic dimension. The dimension reduction of the spectrum can be achieved by adjusting the size of W and P. Figure 2 depicts the meanings represented by W and P in the spectral curve. It can be seen from the figure that in the case of a small P value, there will be a large common part between adjacent windows.

Fig. 2
figure 2

Illustration of \(W\) and \(P\)

$${N}_{f}=\frac{{N}_{r}-W}{P}+1$$
(1)

To estimate the fractal dimension of the spectrum, it is necessary to use the variogram estimator to further decompose each segment of the spectrum. Let the spectral data after segmentation be \({\boldsymbol{S}}^{i}\left(i\in \left\{\text{1,2}\dots {N}_{f}\right\}\right).\) The length of \({\boldsymbol{S}}^{i}\) is \(W\), and \({\gamma }_{\alpha }\left(t\right)\) is calculated by Eq. (2) (Gneiting et al. 2012). Here, \({\boldsymbol{S}}_{u}^{i}\) and \({\boldsymbol{S}}_{u+t}^{i}\) are two points separated by the lag of \(t.\) When \(\alpha =1\), Eq. (2) gives the average of the differences in absolute values, and when \(\alpha =2\), Eq. (2) is the average of difference squares.

$${\gamma }_{\alpha }\left(t\right)=\frac{1}{2}{\text{E}|{\boldsymbol{S}}_{u}^{i}-{\boldsymbol{S}}_{u+t}^{i}|}^{\alpha }$$
(2)

It is evident from Eq. (2) that different values of \(t\) will obtain a corresponding \({\gamma }_{\alpha }\left(t\right)\) using linear regression analysis on \(\text{log}t\) and \(\text{log}{\gamma }_{\alpha }\left(t\right)\) to solve the equation of the linear regression line. The angle of inclination for the slope \({k}_{i}\) is \({\theta }_{i}\). The fractal dimension \({D}_{i}\) is solved according to Eq. (3). Figure 3 shows the scatter diagram and regression line with \(\text{log}t\) as the abscissa and \(\text{log}{\gamma }_{\alpha }\left(t\right)\) as the ordinate, where W = 100 nm, P = 50 nm, and \(t=\{\text{2,4},\dots {2}^{n}\}({2}^{n}\le W)\). The calculation method of the feature extraction \({F}_{i}\) is shown in Eq. (4); \({E}_{i}\) represents the energy of \({\boldsymbol{S}}^{i}\).

Fig. 3
figure 3

Illustration of fractal dimension calculation

$${D}_{i}=2-\frac{{\theta }_{i}}{2}$$
(3)
$${F}_{i}={D}_{i}\times {E}_{i}({E}_{i}={\sum }_{\text{j}=1}^{W}{({\boldsymbol{S}}_{j}^{i})}^{2})$$
(4)

3.2 Extraction of color image texture feature based on LBP

Local binary pattern (LBP) is an image-texture feature description operator and is widely used in face recognition (Ahonen et al. 2004). In this section, a color image feature extraction algorithm based on LBP is proposed, which can be used to effectively identify the wood species tested in this study by combining the wood texture and color information. The algorithm is described as follows.

Let the size of a color image \(\boldsymbol{I}\) be \(M{\times}N{\times}3\). The matrix corresponding to the three-channel RGB of the color image is \({\boldsymbol{I}}_{\boldsymbol{R}},{\boldsymbol{I}}_{\boldsymbol{G}}\), \({\boldsymbol{I}}_{\boldsymbol{B}}\), respectively, and the dimension of this matrix is \(M{\times}N\). We define a 3 × 3 × 3 matrix \(\boldsymbol{w}\) whose central element is \(\boldsymbol{w}(\text{2,2},2)\); it has 26 elements around it. According to Eqs. (5) and (6), the eigenvalue of the central element \(\boldsymbol{w}(\text{2,2},2)\) is calculated as \(v\).

$$v = \mathop \sum \limits_{{i = 1}}^{3} \mathop \sum \limits_{{j = 1}}^{3} \mathop \sum \limits_{{k = 1}}^{3} S\left( {w\left( {2,2,2} \right) - w\left( {i,j,k} \right)} \right)$$
(5)
$$S\left( {x_{i} - x_{c} } \right) = \left\{ {\begin{array}{*{20}c} {1\quad x_{i} > x_{c} } \\ {0\quad x_{i} \le x_{c} } \\ \end{array} } \right.$$
(6)

The eigenvalue \(v\) can be any integer number between 0 and 26. By placing \(\boldsymbol{w}\) in the upper left corner of the color image \(\boldsymbol{I}\) and traversing all the pixels, a matrix \({\boldsymbol{I}}_{\boldsymbol{w}\boldsymbol{G}}\) of size \((\text{M}-2)\times (\text{N}-2)\) can be achieved. The central element will now be  \(\boldsymbol{I}(m,n,2)(m\in \left\{\text{2,3},\dots M-1\right\},n\in \left\{\text{2,3},\dots N-1\right\})\). To take into consideration the characteristics of the remaining channels, we changed the order of the three channels of the RGB image and recombined them to obtain new BRG and GBR images. Two new matrices \({\boldsymbol{I}}_{\boldsymbol{w}\boldsymbol{R}}\) and \({\boldsymbol{I}}_{\boldsymbol{w}\boldsymbol{B}}\) can be obtained by reusing the above method. In Table 2, several characteristic matrices under different wood color channels are presented.

Table 2 Characteristic matrices of different wood species

Next, the feature vectors are extracted. Taking all the values \({x}_{\text{i}}({x}_{\text{i}}\in \left\{\text{0,1},2,\dots 26\right\})\) in each eigenmatrix \({\boldsymbol{I}}_{\boldsymbol{w}\boldsymbol{G}}\),\({\boldsymbol{I}}_{\boldsymbol{w}\boldsymbol{R}},\) and \({\boldsymbol{I}}_{\boldsymbol{w}\boldsymbol{B}}\) as the abscissa, respectively, let \({y}_{\text{i}}\) be equal to the number of elements whose values are \({x}_{\text{i}}\). The feature vector was established with the value of \({y}_{\text{i}}/(\left(M-2\right)\times \left(\text{N}-2\right))\) as the ordinate. There are three such feature vectors. Finally, these feature vectors are connected in series to obtain the final feature vector. Figure 4 displays the feature vectors of the four wood species in Table 2. It can be intuitively seen from Fig. 4 that the characteristic curves of different tree species are less similar.

Fig. 4
figure 4

Eigenvectors of the four wood species in Table 2

3.3 Feature fusion method based on CCA

Presently, fusion methods mainly include data-level, feature-level, and decision-level fusion. In this study, because the spectral and texture features of wood transverse sections were extracted, data-level fusion cannot be conducted. Here, a feature-level fusion method based on canonical correlation analysis (CCA) was used (Sun et al. 2005; Haghighat et al.2016). In this method, two feature vectors can be fused to produce a new feature vector, and the fused feature vector has a better classification and recognition effect than the sole use of one type of feature vector. Prior to the fusion of the eigenvectors, the principal component analysis (PCA) dimension reduction is performed to reduce the dimensions of the two vectors and remove the dimension of the eigenvalue less than \({10}^{-6}\).

It is assumed that the spectral and texture feature vectors are \({\boldsymbol{F}}_{S}=[{\boldsymbol{x}}_{1},{\boldsymbol{x}}_{2}\dots {\boldsymbol{x}}_{n}]\) and \({\boldsymbol{F}}_{T}=[{\boldsymbol{y}}_{1},{\boldsymbol{y}}_{2}\dots {\boldsymbol{y}}_{n}]\), respectively; \({\boldsymbol{x}}_{i}\in {R}^{p1}\), \({\boldsymbol{y}}_{i}\in {R}^{p2}\), where \(p1\) and \(p2\) represent the dimensions of the spectral and texture feature vectors, respectively. After defining the linear combinations \({\boldsymbol{X}}^{*}={\boldsymbol{W}}_{x}^{T}{\boldsymbol{F}}_{S}\), \({\boldsymbol{Y}}^{*}={\boldsymbol{W}}_{y}^{T}{\boldsymbol{F}}_{T}\), then \({\boldsymbol{W}}_{x}^{T}\) and \({\boldsymbol{W}}_{y}^{T}\) corresponding to the maximum value of the Pearson correlation coefficient can be calculated as given in Eq. (7).

$$\text{c}\text{o}\text{r}\text{r}\left({\boldsymbol{X}}^{*},{\boldsymbol{Y}}^{*}\right)=\frac{\text{c}\text{o}\text{v}\left({\boldsymbol{X}}^{*},{\boldsymbol{Y}}^{*}\right)}{var\left({\boldsymbol{X}}^{*}\right).var\left({\boldsymbol{Y}}^{*}\right)}$$
(7)

To calculate the Pearson correlation coefficient, the covariance matrix \(\boldsymbol{S}\) is introduced. \(\boldsymbol{S}\) is defined in Eq. (8) (Haghighat et al. 2016), where \(\boldsymbol{S}\) contains all the relevant information in \({\boldsymbol{F}}_{S}\) and \({\boldsymbol{F}}_{T}\). In Eq. (7), \(var\left({\boldsymbol{X}}^{*}\right)={\boldsymbol{W}}_{x}^{T}{\boldsymbol{S}}_{xx}{\boldsymbol{W}}_{x}\), \(\text{c}\text{o}\text{v}\left({\boldsymbol{X}}^{*},{\boldsymbol{Y}}^{*}\right)={\boldsymbol{W}}_{x}^{T}{\boldsymbol{S}}_{xy}{\boldsymbol{W}}_{y}\), \(var\left({\boldsymbol{Y}}^{*}\right)={\boldsymbol{W}}_{y}^{T}{\boldsymbol{S}}_{yy}{\boldsymbol{W}}_{y}.\) Under the condition that \(var\left({\boldsymbol{X}}^{*}\right)=var\left({\boldsymbol{Y}}^{*}\right)=1\), the Lagrangian multiplication is used to maximize Eq. (7).

$$\boldsymbol{S}=\left[\begin{array}{cc}cov\left({\boldsymbol{F}}_{S}\right)& cov\left({\boldsymbol{F}}_{S},{\boldsymbol{F}}_{T}\right)\\ cov\left({\boldsymbol{F}}_{T},{\boldsymbol{F}}_{S}\right)& cov\left({\boldsymbol{F}}_{T}\right)\end{array}\right]=\left[\begin{array}{cc}{\boldsymbol{S}}_{xx}& {\boldsymbol{S}}_{xy}\\ {\boldsymbol{S}}_{yx}& {\boldsymbol{S}}_{yy}\end{array}\right]$$
(8)

According to the literature (Sun et al. 2005), there are two schemes for implementing feature-level fusion: “Concat” and “Sum”. The features after fusion are defined as typical correlation discriminant features and are denoted as \(\boldsymbol{Z}\). The two feature-level fusion methods can be calculated according to Eqs. (9) and (10):

$${\boldsymbol{Z}}_{1}=\left(\begin{array}{c}{\boldsymbol{X}}^{*}\\ {\boldsymbol{Y}}^{*}\end{array}\right)=\left(\begin{array}{c}{\boldsymbol{W}}_{x}^{T}{\boldsymbol{F}}_{S}\\ {\boldsymbol{W}}_{y}^{T}{\boldsymbol{F}}_{T}\end{array}\right)=\left(\begin{array}{cc}{\boldsymbol{W}}_{x}^{T}& 0\\ 0& {\boldsymbol{W}}_{y}^{T}\end{array}\right)\left(\begin{array}{c}{\boldsymbol{F}}_{S}\\ {\boldsymbol{F}}_{T}\end{array}\right)$$
(9)
$${\boldsymbol{Z}}_{2}={\boldsymbol{X}}^{*}+{\boldsymbol{Y}}^{*}={\boldsymbol{W}}_{x}^{T}{\boldsymbol{F}}_{S}+{\boldsymbol{W}}_{y}^{T}{\boldsymbol{F}}_{T}={\left(\begin{array}{c}{\boldsymbol{W}}_{x}\\ {\boldsymbol{W}}_{y}\end{array}\right)}^{T}\left(\begin{array}{c}{\boldsymbol{F}}_{S}\\ {\boldsymbol{F}}_{T}\end{array}\right)$$
(10)

The feature-level fusion process of texture features and spectral features is shown in Fig. 5. First, the texture features of RGB images were extracted by LBP feature extraction, then the spectral features of wood cross section were extracted by fractal theory, and finally the two feature vectors were fused by CCA feature fusion to make them into a single feature vector.

Fig. 5
figure 5

Texture and spectral feature-level fusion diagram with CCA algorithm (the wood species is Amygdalus davidiana, and the fractal parameters are \(P=20\), \(W=200\). Texture feature dimension is 81, spectral feature is 96, and the fused feature with “concat” operator is 88 while that with “sum” is 44)

4 Results and discussions

4.1 Classification of wood species using spectral features

In this section, the influence of different \(W\) and \(P\) in Sect. 2.1 on wood classification accuracy is examined to find the optimal values, and the classification accuracy under different P is discussed. The classifier used in this study is a SVM (support vector machine) classifier. To obtain a reliable and stable model, the “leave-one-out” in the cross-validation (Browne 2000) is used as the evaluation method of classification results; that is, only one sample is left as the test set, and the remaining samples are the training set. Thus, the sample set data can be fully utilized.

As can be seen from Fig. 6, the value \(P\) is inversely proportional to the classification accuracy to some extents. The change in \(W\) and \(\alpha\) values has little influence on the accuracy. The reduction of \(W\) and \(P\) values leads to an increase in the feature dimension, which not only increases the feature extraction time but is also traced to the increase in post-processing amount. Table 3 displays the detailed values of cross-validation classification accuracy \(Ac\), feature extraction time \(Ti\), and feature dimension \(Di\), corresponding to \(W\) and \(P\) when \(\alpha =2\) in Fig. 6.

Fig. 6
figure 6

Influence of variables \(P\) and \(W\) on cross-validation (a \(\alpha =0.5\), b \(\alpha =1\), c \(\alpha =2\), d \(\alpha =4\))

Table 3 Effects of variables \(P\) and \(W\)

Table 3 lists the average of 50 feature extraction times. After comprehensive consideration, it is suggested using larger W and P values, namely the optimal value \(W=350,\) \(P=40,\alpha =4\) in Fig. 6d. The corresponding classification accuracy is 91.40%, while the feature dimension is 41 and the feature extraction time is 0.0013 s.

4.2 Classification of wood species using texture features

In this section, it is discussed whether the wood texture classification method described in Sect. 2.2 has higher recognition accuracy in different neighborhood ranges and different color spaces. Four field points are illustrated in Fig. 7. The black cube represents the pixel points that need to be considered. When all pixels around the central pixel are considered, as illustrated in Fig. 7a, we obtain feature dimension 81. If only a few pixels near the central pixel are considered, as shown in Fig. 7b, c, their feature dimensions are 57 and 45, respectively. Figure 7d considers 74 pixel points near the innermost pixel; feature dimension 225 is obtained.

Fig. 7
figure 7

Four characteristic cases of textural feature classification a all 26 adjacent pixels are considered; b a few pixels are considered; c other pixels are considered; d all 74 adjacent pixels are considered

Table 4 shows the classification accuracy and feature extraction time of the four eigenvectors under the SVM classifier. It can be seen that the higher the feature dimension, the higher the accuracy; however, the influence of the feature dimension on the classification accuracy is not strong, and the increase in the feature dimension will lead to a significant increase in the feature extraction time.

Table 4 Classification accuracy and feature extraction time

In addition, Table 4 also considers the influence of 81D feature vectors on the classification accuracy under different color spaces. It can be seen from the results that there are great differences in the classification accuracy in different color spaces. If the RGB space is converted into NTSC or HSV space, the classification accuracy is significantly reduced.

Table 5 shows images of wood transverse sections in different color spaces. In Table 5, Image_1, Image_2 and Image_3 represent the grayscale images in three different channels, respectively. Table 5 shows that all three channels of RGB images have relatively clear texture structure, while texture information of some channels in NTSC and HSV space is not obvious. To further explain the problem, Table 5 also gives the mean (Mean), contrast (Con) and entropy (Ent) of the gray difference histogram (the larger the value of Mean, Con and Ent variables, the greater the difference between adjacent pixels of the image) (Li and Liu 2009; Wu et al. 1992). It can be found from the results that the difference between adjacent pixels in RGB image is large and the texture is obvious. Because of this, the classification accuracy of NTSC and HSV is lower than that of RGB color space.

Table 5 Image of wood in different color spaces

4.3 Classification of wood species by fusion features

In this section, the effect on classification after the fusion of spectral and texture features is discussed and the best parameters that ensure the most accurate classification of texture and spectral features are found.

Figure 8 shows the feature classification results of texture and spectral features using CCA in different fusion schemes under different parameters. The classification effect of using “Concat” fusion scheme is similar for both 57D and 45D texture and spectral features, with the highest classification accuracy being up to 99.16% in the “leave-one-out” cross-validation. The classification rate using “Concat” fusion scheme is greater than that using “Sum” fusion scheme.

Fig. 8
figure 8

Results of feature fusion classification accuracy a \(\text{P}=20;\) b \(\text{P}=40\)

It is worth mentioning that the accuracy of wood classification using a single feature increases in proportion to the size of the dimension. However, after a feature fusion with CCA, an increase in dimension does not yield higher classification accuracy. Table 6 shows the detailed data corresponding to the highest accuracy for each method depicted in Fig. 8. Di-T and Di-S represent the feature dimensions of texture and spectrum, respectively, and Ti represents the total time required for extracting the two features, feature fusion and classification for one wood sample.

Table 6 Classification results after feature fusion in the “leave-one-out” cross-validation

To reflect the complementary effect of feature fusion in this study, the dataset was divided, and 35 samples of each tree species were randomly selected as the training set and the remaining 15 samples as the test set. It should be emphasized that the classification accuracy of the test set in this division-validation was lower than that in the “leave-one-out” cross-validation. The classification accuracy on the test set was 96.20% in this division-validation. Figure 9 shows the classification results of 750 samples in the test set after using spectral and texture features alone and with feature fusion. The abscissa in Fig. 9 represents the serial number of the wood sample, which can be divided by 15 to obtain the actual label serial number, and the ordinate represents the label serial number of the wood species. The “*” and the “+” represent the error sample distributions of wood classification using spectral and texture features alone, respectively, and the “O” represents the error sample distributions after “Concat” feature fusion.

Fig. 9
figure 9

Sample classification after fusion

It can be seen from Fig. 9 that the number of error samples after fusion is significantly smaller than when only spectral and texture features are used. In other words, spectral and texture features complement each other and further improve the classification accuracy of wood. In summary, the highest observed classification accuracy rate was 99.16%, with parameters \(\text{W}=250\) and \(\text{P}=20\), when “Concat” was used as the fusion strategy and 45D as the texture feature.

The classification degree of texture feature, spectral feature and fusion feature is also discussed. The sample set \(\boldsymbol{X}\) is defined as having \(C\) classes (in this study \(C=50\)). There are a total of \({n}_{j}\) samples in each class j (in this study \({n}_{j}=50\)), the sample’s mean feature of each class j is \({\boldsymbol{m}}_{j}\), the mean of all wood samples is \(\boldsymbol{m}\), \({P}_{j}\) is the prior probability for class j. The intra-class divergence matrix \({\boldsymbol{S}}_{w}\) and the inter-class divergence matrix \({\boldsymbol{S}}_{b}\) are computed in Eqs. (11) and (12), respectively. The dispersion criterion function \(J\) is shown in Eq. (13). Obviously, the larger \(J\) is, the more divisible its feature is.

$$\user2{S}_{w} = \mathop \sum \limits_{{j = 1}}^{C} P_{j} \user2{S}_{j} ~~\left( {{\mathbf{S}}_{j} = \frac{1}{{n_{j} }}\mathop \sum \limits_{{k = 1}}^{{n_{j} }} \left( {\user2{X}_{k}^{j} - \user2{m}_{j} } \right)\left( {\user2{X}_{k}^{j} - \user2{m}_{j} } \right)^{T} ~~} \right)$$
(11)
$$\user2{S}_{b} = \mathop \sum \limits_{{j = 1}}^{C} P_{j} \left( {\user2{m}_{j} - \user2{m}} \right)\left( {\user2{m}_{j} - \user2{m}} \right)^{T}$$
(12)
$$J=\text{t}\text{r}[{\boldsymbol{S}}_{w}^{-1}{\boldsymbol{S}}_{b}]$$
(13)

Table 7 presents the separability results of texture features, spectral features and fused features after CCA fusion. From Table 7, it can be seen that fused features after CCA fusion have stronger separability.

Table 7 Feature separability measurement

4.4 Comparison with other methods

In this section, the proposed method is compared with mainstream wood classification methods. The comparison methods mainly discussed in the literature include the GLCM texture classification method (Rosli et al. 2019), Improved-Basic Gray Level Aura Matrix (I-BGLAM) method (Zamri et al. 2016), Fuzzy + SPPD (statistical property of pores distribution) + I-BGLAM method (Ibrahim et al.2018, 2017), kernel genetic algorithm (GA) (Yusof et al. 2013), color moment feature method (Zhao 2013), multidimensional texture method (Barmpoutis et al. 2018), and spectral extraction method (Peng and Yue 2019). In addition, some pre-trained networks were used to classify wood images, including GoogLeNet (Szegedy et al. 2015), SqueezeNet (Iandola et al. 2016), ResNet18 (He et al. 2016) and Vgg16 (Simonyan and Zisserman 2014). The training parameters of these networks have been predetermined and trained using transfer learning, which requires the use of the Deep Learning Toolbox™ in Matlab (Beale et al. 2020). The highest classification accuracy obtained by each algorithm on the present dataset is shown in Table 8.

Table 8 Classification accuracy of mainstream algorithms in the cross-validation

As evident from Table 8, the classification accuracy of the methods cited in the GA (GA + KDA, kernel discriminant analysis) and Fuzzy + SPPD + I-BGLAM is low because they all use the statistical features of pores. There are three main problems with using pore statistical features from the dataset used in this study. First, not all the wood species in the present dataset have pores, which makes it impossible to extract the pore features for such wood species. Second, capturing the macroscopic characteristics of the wood transverse section in the pore segmentation is difficult. Third, the number of wood samples containing white pores in the dataset is relatively small. The method of identifying wood species by using texture features has not achieved satisfactory results because a large number of wood species sampled in this study have similar textures. Although the use of color moments can provide a higher classification accuracy, the color of the wood surface is not durable and wood cross-sections change color frequently. Therefore, it is generally not considered as a reliable basis for the classification of wood species. The methods of using convolutional neural network can obtain higher wood species recognition accuracy, but the accuracy of these methods is still lower than that of the current feature fusion method.

4.5 Influence of noise on classification accuracy

When using the spectral reflectance curves and macroscopic images of wood transverse sections to classify wood, the influence of the external environment on data collection cannot be avoided. Several factors affect the spectral reflectance curves, including light conditions, wood surface impurities, and calibration frequency. The factors affecting the quality of macroscopic images mainly include light conditions, shooting equipment and others. The CCD camera used in this study was not high definition, and the person who captured the image was not a professional. The experimental results show a good classification accuracy, which indirectly indicates that the method proposed in this paper can deliver high accuracy even with ordinary shooting equipment.

In this section, the influence of noise on the classification accuracy is discussed. A certain amount of noise was added to the spectral reflectance curve and the image, to simulate the influence of external factors, and the term SNR (signal noise ratio) was assigned for spectral reflectance noise. The noise added to the image was “Gaussian noise”, with a constant mean value 0, and the variance was used as the description of the noise. Figure 10 shows the changes in classification accuracy after adding noise, with parameter P = 20, W = 250, texture feature dimension 54. It can be seen from Fig. 10 that the classification accuracy is still maintained at a high level after the addition of noise.

Fig. 10
figure 10

Influence of noise on classification accuracy in the “leave-one-out” cross-validation

Table 9 shows the classification accuracy after noise addition. The columns Noise 1 and Noise 2 depict the classification accuracy after a large amount of noise is added to the spectrum and the image, respectively, and Noise 3 describes the classification accuracy after adding a large amount of noise to both the spectrum and image. It can be seen from Table 9 that adding a large amount of noise to the spectrum or image alone does not significantly reduce the classification accuracy, which is maintained at approximately 96%. Even after simultaneously adding a large amount of noise to the spectrum and image, more than 80% accuracy can still be ensured. In other words, even after a little distortion during the spectral or image acquisition process, the method proposed in this paper can still ensure a high classification accuracy.

Table 9 Classification accuracy after adding noise to image, spectrum or both

5 Conclusion

In this study, after obtaining spectral and image data of 50 wood species transverse sections, their spectral and texture features were fused after extracting them using the fractal and LBP operator, respectively. The SVM classifier was then used to classify these wood species. The experimental results showed that the classification accuracy of fused features increased significantly, and the classification accuracy of 99.16% is higher than that achieved with single features alone. Therefore, wood classification accuracy can be further improved by using CCA fusion to integrate spectral features and texture features.

The experimental results also show that the present 3D LBP feature extraction operator has certain advantages in texture classification of wood images, and its classification accuracy is higher than that of GLCM and I-BGLAM texture feature extraction operators. Moreover, the method described in this paper has good anti-interference ability. In the case of noise interference to spectral data or image data, it can still identify wood species accurately, that is to say this method has low requirement for the external environmental conditions.

In conclusion, the proposed method can recognize all kinds of wood species, including some visually similar wood species and biologically similar wood species within the same genus. The experimental equipment (i.e., as illustrated in Fig. 1) for spectral and image data collections is relatively cheap compared to hyperspectral imaging device, and these two equipments can collect both image and spectral information. Therefore, it has a certain application potential in the classification of wood species based on multiple features.