Introduction

Wood defect detection is an important process in wood board manufacture, and its results directly influence the quality of wood products. Wood-defect detection includes image acquisition, image segmentation, feature extraction, and defect classification (Ruz et al. 2009). In the process of image acquisition, surface information about the wood board (Estévez et al. 2003) was collected by an industrial camera. Pham and Alcock (1998) summarized 32 feature vectors of four types, including windows, shapes, statistical value and gray-scale. Ruz et al. (2009) proposed three methods for feature selection: statistical method, “leave one out” method, and genetic algorithms. The results showed that the genetic algorithm has the best effect. Our previous experiments showed that the following features, including gray-scale, texture, invariant moment, and geometry region, could give a complete representation of the defects (Zhang et al. 2013). However, as various features lead to complex computation and affect detection speed, the application of feature fusion becomes necessary.

Classification is a critical process in defect detection using the MLP neuron network classifier, Pham and Alcock (1999) analyzed the precision of the classification and found that the number of neurons in hidden layers had no obvious influences on the results, yet the final results were greatly affected by the learning rate. Castellani and Rowlands (2009) experimented with decoration board classification, using a neuron network together with a genetic algorithm, but it was only effective in recognizing wood board with a single defect on its surface. When there were two or more types of defects in the same image, this method was even less effective.

Gu et al. (2008) proposed a support-vector machine to classify four kinds of defect, used B-spline to identify the boundary and area of the defects, and chose internal color, edge color, and the external color as defect features in classifying. As the accuracy of the B-spline boundary is questionable, the speed and accuracy of recognition were affected. Zhang et al. (2013) proposed defect detection based on a SOM (self-organizing map) neural network that requires fewer training samples.

For improving the accuracy of wood-board defect detection and overcoming the disadvantages of multi-dimension and computation complexity, we focused on feature fusion and classifier design. Through linear transformation, the PCA method could discover data variety from multi-dimension and reduce the data dimension by preserving features with the biggest contribution in variance. Compressed sensing is a signal processing method proposed by Donoho (2006) and Candes (2006). Signals are compressed or made sparse through proper transformation. By calculating the optimized feature matrix, defect samples are detected. Because complicated training processes are unnecessary for the compressed sensing method, takes less computation time and produces better classification results.

Materials and methods

Materials

We focused on three wood board defects: dead knots, live knots, and wood cracks. The size of the board in our experiment was 40 × 20 × 2 cm. The wood species was Xylosma. The experiment was conducted in MatlabR2012, a platform with a 64-bit PC (Corei3, Double-core, 2.25 GHz), and an Oscar F810C IRF camera was used to obtain experiment images. To make the images clearer, two parallel LEDs were used for illumination. In addition, 50 8-bit gray levels images of 128 × 128 pixels were used for training (20 live knots, 20 dead knots and 10 cracks).

Feature extraction and fusion of wood defects

Feature extraction and fusion is the first step in wood defect identification. The features should include as much defect information as possible and require less calculation work at the same time. First, we extracted 25 features of three types including geometry features, regional features, texture features, and invariant moment. Then, we conducted features normalization, used principal component analysis (PCA) for feature fusion, and selected features with greater contribution for defect identification. The extraction and fusion process is shown in Fig. 1.

Fig. 1
figure 1

Flow diagram of feature extraction and fusion

We got a complete representation of the defects from 25 features of three types in the wood board images. By extracting and observing these features, the same type of defects had similar feature values and different kinds of defects had different feature values. However, 25 features contained a large number of repeat information with a duplicate expression, when feature numbers increased and the amount of computation work increased. PCA was implemented to exchange and fuse these features and reduce the number of feature dimensions. Each new feature was obtained by linear combination and transformation of the original features which ensured information preserving of the image. The steps of PCA feature fusion are as follows:

Build the standardize sample matrix X in Eq. 1, where, p is the feature dimension, and n is the sample number (n > p).

$$ X_{n \times p} = \left[ \begin{aligned} x_{1} \hfill \\ x_{2} \hfill \\ \vdots \hfill \\ x_{i} \hfill \\ \vdots \hfill \\ x_{n} \hfill \\ \end{aligned} \right] = \left[ \begin{aligned} x_{1,1} ,x_{1,2} , \ldots ,x_{1,p} \hfill \\ x_{2,1} ,x_{2,2} , \ldots ,x_{2,p} \hfill \\ \vdots \hfill \\ x_{i,1} ,x_{i,2} , \ldots ,x_{i,p} \hfill \\ \vdots \hfill \\ x_{n,1} ,x_{n,2} , \ldots ,x_{n,p} \hfill \\ \end{aligned} \right]^{T} ,i = 1,2, \ldots ,n $$
(1)

Use Eq. 2 to standardize transform sample X, and then obtain standardized matrix Z.

$$ z_{i,j} = \frac{{x_{i,j} - \overline{{x_{j} }} }}{{s_{j} }},i = 1,2, \ldots ,n; j = 1,2, \ldots ,p $$
(2)
$$ \overline{{x_{j} }} = \frac{{\sum\limits_{i = 1}^{n} {x_{i,j} } }}{n},s_{j}^{2} = \frac{{\sum\limits_{i = 1}^{n} {\left( {x_{i,j} - \overline{{x_{j} }} } \right)^{2} } }}{n - 1} $$
(3)

Calculate covariance matrix R of standardized matrix Z.

$$ R = \frac{{Z^{T} Z}}{n - 1},\;{\text{and}}\;r_{i,j} = \frac{{\sum\limits_{j = 1}^{p} {\sum\limits_{k = 1}^{n} {z_{j,k} \cdot z_{k,j} } } }}{n - 1},k = 1,2, \ldots ,n;i,j = 1,2, \ldots ,p $$
(4)

Calculate eigenvalue \( \lambda \) and eigenvector \( \alpha \) of characteristic equation \( \left| {R - \lambda E} \right| = \overrightarrow {0} \) with covariance matrix R.

Rerank the eigenvalue by descending order and obtain \( \overline{\lambda } \), calculate the contribution and cumulative contribution of each principal component by Eqs. 5 and 6.

The contribution of each principal component is Eq. 5:

$$ n_{i} = \frac{{\overline{\lambda }_{i} }}{{\sum\limits_{j = 1}^{p} {\overline{\lambda }_{j} } }},i = 1,2, \ldots ,p $$
(5)

The cumulative contribution is Eq. 6:

$$ {\kern 1pt} m_{i} = \frac{{\sum\limits_{j = 1}^{i} {\overline{\lambda }_{j} } }}{{\sum\limits_{j = 1}^{p} {\overline{\lambda }_{j} } }},i = 1,2, \ldots ,p $$
(6)

Choose the first k principal components which cumulative contribution can reach the pattern recognition need and transform matrix E is obtained by Eq. 7:

$$ E = \left[ {\overline{{\alpha_{1} }} ,\overline{{\alpha_{2} }} , \ldots ,\overline{{\alpha_{k} }} } \right] $$
(7)

Calculate final principal components Y, and Y is the final input of the defect classifier.

$$ Y = E \times X $$
(8)

Defect detection based on compressed sensing

In applying compressed sensing to wood classification, we used optimized feature vectoring as the sample sequence and training samples to create a data dictionary, and tested samples linearly by training the samples. When the sparse representation vector of test samples in the data dictionary was calculated by solving optimization problem under the l 1 norm, the classification result was obtained.

With compressed sensing, when a signal is sparse in certain transformation domains, an observation matrix, which is irrelevant to the transformation basis, will project the multi-dimensional information obtained by the transformation into low-dimensional space. Then, by solving a convex optimization problem, an original signal is reconstructed from these few projections of high probability (Donoho 2006; Shi et al. 2009).

First, assume x as one-dimension discrete time signal of a real value with limited length, and can be used as a sequence vector of \( n \times 1 \) dimensions. If matrix ψ and vector α exists and Eq. 9 is meaningful, then x is sparse in domain ψ.

$$ x = \psi \alpha $$
(9)

where, ψ is the orthogonal transformation basis called a sparse matrix, and \( \psi \in R^{n \times n} \). The transformation coefficient of x in domain ψ, α, is \( \alpha \in R^{n \times 1} \), the number of nonzero values is far less than the number of signal dimensions.

If the signal is projected onto a matrix of φ, which has no relationship with the transformation basis, the observed signal y is obtained through Eq. 10.

$$ y = \phi x = \phi \psi \alpha $$
(10)

where, ϕ is the observation matrix, and \( \phi \in R^{m \times n} \): y is the observation vector, and \( y \in R^{m \times 1} \).

Finally, by deriving the optimized l 0 norm of Eq. 11, we obtain α 1 , which is the exact or approximate solution of α.

$$ \hbox{min} \left\| {\alpha_{1} } \right\|_{0} \text{ }s.t.\text{ }y = \phi x = \phi \psi \alpha $$
(11)

Equation 11 is an undetermined equation. According to compressed-sensing theory, if the signal is sparse enough, the question of minimum can be transformed into deriving the l 1 norm of Eq. 12, a process of convex optimization. By solving the problem of linear programming, the original signal is reconstructed from N observation values.

$$ \hbox{min} \left\| {\alpha_{1} } \right\|_{1} \text{ }s.t.\text{ }y = \phi x = \phi \psi \alpha $$
(12)

If \( f_{i}^{j} \) is the feature vector of image j of wood type i, and \( f_{i}^{j} \) is one row of training samples, then \( A_{i} = [f_{i}^{1} ,f_{i}^{2} , \ldots ,f_{i}^{m} ],A_{i} \in R^{n \times N} \) represents samples of wood type i. The \( f_{i}^{j} \) in this matrix is the feature vector, and \( f_{i}^{j} \in R^{n \times 1} \), m is the number of training samples of wood type i.

The data dictionary matrix composed of three types of training samples is shown in Eq. 13:

$$ A = [A_{1} ,A_{2} ,A_{3} ] $$
(13)

If the number of training samples of wood type i is adequate, then the feature vector of test image y can be represented by linear combination of training samples A i which belong to wood type i, that is:

$$ y = \alpha_{i}^{1} f_{i}^{1} + \alpha_{i}^{2} f_{i}^{2} + \cdots + \alpha_{i}^{m} f_{i}^{m} = A_{i} \alpha_{i} $$
(14)

where, y is the feature vector of the test image, and \( y \in R^{n \times 1} \); the α i here is the vector of linear representation coefficient, and \( \alpha_{i} \in R^{n \times 1} \).

If we apply the above equations to the whole data dictionary matrix A, then:

$$ y = A\alpha $$
(15)

where, α is sparse vector, and N is the total amount of samples.

$$ \alpha = (0 \ldots 0, \ldots ,\alpha_{i}^{1} ,\alpha_{i}^{2} , \ldots ,\alpha_{i}^{m} , \ldots ,0 \ldots 0)^{T} ,\alpha \in R^{n \times 1} $$
(16)

If the test samples are of type i, except for the only m data that represent the feature of wood type i, then all other data in vector α are 0. In other words, as the number of values which are not 0 in α is much less than the number of signal dimensions, α is a sparse vector, and the above process is considered the sparse decomposition of test samples.

The classification process of unknown test samples is as follows: put test sample feature y into Eq. 16 in which \( y \in R^{{^{n \times 1} }} \) and \( A \in R^{n \times N} \), and acquire sparse vector α by solving Eq. 16. Here, Eq. 16 is an undetermined system of equations, and α vector is a sparse vector. According to compressed-sensing theory, the exact solution or approximate solution of α can be obtained by solving the optimization problem of Eq. 17 of l 1 norm. The ε in this Eq. 17 is the error threshold. In actual application, the sample type is determined by the non-zero item in α 1 .

$$ \alpha_{1} = \arg \hbox{min} \;\left\| \alpha \right\|_{1} \;\;s.t.\;\left\| {A\alpha - y} \right\| \le \varepsilon $$
(17)

Results and discussion

Classification steps

The defect classification experiment is shown in Fig. 2, including image collection, morphology segmentation, feature extraction, feature fusion, classifier design, and result assessment.

Fig. 2
figure 2

The flow diagram of defect classification

The defect images are read by Matlab12; for example, Figs. 3, 4, 5 represent live knot, dead knot, and crack, respectively. To reduce the computation work and increase the speed of computation, all color images are transferred into gray-scale images and changed to standard pictures of 128 × 128 pixels.

Fig. 3
figure 3

Live knot

Fig. 4
figure 4

Dead knot

Fig. 5
figure 5

Crack

Mathematical morphology is an image-processing method based on geometry with the advantages of continuous image skeleton, fewer breaking points, and rapid, exact image segmentation (Zhang et al. 2014a, b). Using this method, the exact defect targets are separated from the background. The results of segmentation are shown in Figs. 6, 7, 8.

Fig. 6
figure 6

Segmentation result of live knot

Fig. 7
figure 7

Segmentation result of dead knot

Fig. 8
figure 8

Segmentation result of crack

Calculate 25 features when segmentation is over and then normalize them. According to Eqs. 14, PCA maps high dimension features to low dimension spaces, and find eigenvalue \( \lambda \) and eigenvector α by covariance matrix R. \( \overline{\lambda } \) is obtained by re-ranking eigenvalue \( \lambda \) in descending order.

The variance of data is reflected by corresponding eigenvalues. Over different spaces with the same dimensionality, the space spanned by the eigenvectors corresponding to the larger eigenvalues carries the most variance. From Eqs. 5 and 6, contribution of each principal component \( n_{i} \) and cumulative contribution \( m_{i} \) can be obtained (Table 1).

Table 1 Contribution of each principal component

The top eight principal components can reach a cumulative contribution of more than 95 %; therefore, those components are selected as the input of the classifier.

After the determination of principal components, calculate its means and build the data dictionary A by the 50 training samples. The data dictionary A of three types of defects is shown as follows:

$$ A = \left[ \begin{array}{lll} 0. 0 1 9 6 & { - 0} . 0 2 0 9& 0. 0 0 2 6 \hfill \\ { - 0} . 0 5 5 4& 0. 0 9 5 3 & { - 0} . 0 7 9 8 \hfill \\ 0. 1 9 4 2 &{ 0} . 0 4 9 4 { } & { - 0} . 4 8 7 3 \hfill \\ { - 0} . 0 3 1 0& { - 0} . 1 3 5 0 { } & 0. 3 3 2 0 \hfill \\ { - 0} . 0 2 5 4& 0. 0 7 3 0 & { - 0} . 0 9 5 1 \hfill \\ 0. 0 4 6 9 & { - 0} . 2 6 1 3 { } & 0. 4 2 8 7 \hfill \\ 2. 1 2 0 4 & { - 1} . 3 7 0 4 & { - 1} . 5 0 0 0 \hfill \\ 0. 1 2 1 6 & 2. 4 1 2 5 & { - 5} . 0 6 8 3 \hfill \\ \end{array} \right] $$
(18)

Figures 3, 4 and 5 are employed as testing samples. Extract 25 features from the segmented images of Figs. 6, 7, 8, and then calculate the principal components from defect features by PCA transformation. The principal components of the three test samples are represented by h T , s T, and l T, respectively, as follows:

$$ b_{i}^{T} = \left[ \begin{aligned} h^{T} \hfill \\ s^{T} \hfill \\ l^{T} \hfill \\ \end{aligned} \right] = \left[ \begin{array}{llllllll} { - 0} . 6 5 6 2 &{ 1} . 1 2 3 9 { }&1. 3 0 1 8 { }&0. 4 4 7 1 &{ - 1} . 5 0 5 0 { }&0. 4 2 6 6 { }&2. 2 1 6 8& { - 1} . 3 4 3 6\hfill \\ 0. 7 1 9 2& { - 0} . 2 3 1 5 &{ 1} . 4 4 6 5 { }&0. 3 3 1 1 { }&3. 1 8 6 8 { }&0. 8 2 6 9 &{ - 1} . 9 5 1 1 { }&2. 0 0 0 5\hfill \\ 0. 1 4 0 5 &{ 0} . 5 1 8 9 { }&0. 0 2 0 3 { }&1. 0 1 8 3 { }&1. 6 2 3 0 { }&2. 1 2 3 6 &{ - 3} . 9 1 2 9 &{ - }4.8866 \hfill \\ \end{array} \right] $$
(19)

Implement classification in accordance with Eq. 17, and obtain \( \alpha_{{A_{i} }}^{T} \) with the least square method:

$$ \alpha_{{A_{i} }}^{T} = \left[ \begin{aligned} \alpha_{h}^{T} \hfill \\ \alpha_{s}^{T} \hfill \\ \alpha_{l}^{T} \hfill \\ \end{aligned} \right] = \left( {\begin{array}{lll} {1.2905} & 0 & 0 \\ 0 & {1.1643} & {1.1361} \\ {0.2891} & {0.1522} & {1.5256} \\ \end{array} } \right) $$
(20)

The sample type is determined by the maximum value in \( \alpha_{{A_{i} }}^{T} \). According to \( \alpha_{{A_{i} }}^{T} \), we see that the classification results of Figs. 3, 4, 5 are live knot, dead knot and crack, respectively.

The effective test of the PCA feature fusion

To verify the necessity of feature selection, we carried out defect-detection comparison tests between the PCA feature-fusion and variance selection methods (Peck and Devore 2005). We used 50 sample images of live knots, dead knots, and cracks for feature selection and classification. In the process of variance selection, we chose features according to their variances for the between-sample dispersion and divisibility, as determined by feature variance. The classification results of the variance selection and PCA methods are shown in Table 2.

Table 2 Result of feature comparison

In Table 2, the recognition rate without the feature selection step is 68 %, and the time required for recognition is 0.7125 ms. The PCA method has the best recognition rate at 92 %, and its recognition time is 0.2015 ms. Therefore, the feature selection can not only reduce identification time, but also increase the recognition rate.

The classification test of compressed sensing classifier

To test the performance of the classification method proposed in our study, we used the neural network classifier (Candes 2006). As the SOM neural network requires fewer training samples with higher classification accuracy, SOM was compared with compressed sensing in our experiment. Live knots, dead knots, and cracks for 50 test images were classified, and the accuracy and time of classification are shown in Table 3.

Table 3 The comparison with SOM neural network method

As shown in Table 3, step-by-step iterative computation is necessary in the process of SOM classification: each step may influence the computation results, so the SOM classifier has limits on recognition and time consuming. However, the wood defect recognition based on compressed sensing doesn’t require complex computation. The time required for recognition is significantly reduced, while the exactness of recognition has improved by 5 % over the SOM classifier.

Conclusion

Focusing on the complexity of wood-board surface defect information, we proposed a new defect feature fusion method by performing PCA on the high-dimension features. Then, we built a compressed sensing classifier to construct the data dictionary of typical samples, and obtained an optimized solution using the least square method. The results of simulation experiments reveal that the PCA fusion method can give a more complete representation of the defect information. Compared with the SOM neural network algorithm, the compressed sensing classifier has several advantages: fewer parameters, better flexibility, less computation time, and higher classification exactness. Therefore, the defect classification algorithm based on PCA fusion and compressed sensing can effectively increase the speed and exactness of wood-defect detection.