Introduction

Sugi (Cryptomeria japonica) is endemic to Japan, where its range covers much of the Japanese archipelago. Sugi has been planted intensively in Japan since the middle of the last century, and sugi afforestation accounts for about 50 % of the total area devoted to plantations in Japan (Forestry Agency 2013). Since sugi presents a fine grain and good processing performance, it has been used for a number of end uses. In particular, demands for its use in construction members, such as beams and columns, have increased recently. Thus, reliable values for the mechanical properties of sugi are urgently required.

It is well known that the properties of sugi wood show wide variations. Fujisawa et al. (1994) pointed out that genetic variation is a major influence on these variations in sugi wood properties. The dynamic modulus of elasticity shows high broad-sense heritability, ranging from 0.597 to 0.867 in plus-tree clones of sugi (Fujisawa et al. 1992). Variations in wood density and moisture contents are also under strong genetic control (Fujisawa et al. 1993, 1995).

In order to obtain reliable estimates for genetic parameters, it is necessary to measure a large number of samples. Rapid and cost-effective methods of evaluating the properties of wood are required for forest tree breeding programs. A number of studies have demonstrated that near-infrared (NIR) spectroscopy has high potential for the rapid assessment of various characteristics of wood, including wood stiffness (Tsuchikawa 2007; Tsuchikawa and Schwanninger 2011). A notable advantage of NIR spectroscopy is that it can be used to assess multiple traits simultaneously. When selection is applied to improve the economic value of tree growth and wood properties, it is generally applied to several traits simultaneously and not just to one, because economic value depends on more than one trait (Falconer and Mackay 1996). Near-infrared spectroscopy is a useful tool for collecting large amounts and multiple types of phenotypic information.

Forest geneticists and tree breeders have traditionally focused on polygenic (i.e., quantitative or complex) traits. In the application of genomics-based breeding, complex traits need to be dissected into their individual gene components (Neale 2007). Recent studies affirm that association genetics is a very powerful approach for dissecting complex traits in trees (Thumma et al. 2005; Gonzalez-Martinez et al. 2007). Similarly, most wood properties are associated with many other characters and can be considered complex traits. For instance, the stiffness of wood is affected by its density, the crystallinity of cellulose, the microfibril angle, and so on. However, traditional methods of measuring these properties are generally time-consuming and require high levels of skill. During the process of building a calibration model, we can detect the NIR spectral bands that are strongly related to the target trait. Thus, the NIR spectral bands can be considered the dissected components of the target trait. Moreover, if the NIR spectral bands—i.e., the dissected components—are under genetic control, NIR spectroscopy could provide useful information for association genetics.

The first objective of the study reported in the present paper was to develop a calibration model that could be used to predict the stiffness of wood using NIR spectroscopy coupled with multivariate statistical analysis. The second objective was to select the NIR bands that are highly related to wood stiffness and examine the genetic control of these bands.

Materials and methods

Plant materials and measurement of log stiffness

Wood samples were collected from three stands of sugi plus-tree clones located at Okayama Prefecture (35°03′N, 134°06′E), Tottori Prefecture (35°14′N, 134°14′E), and Kochi Prefecture (33°36′N, 133°41′E). The stands were established from 1967 to 1972 with the aim of conserving genetic resources. From 2000 to 2002, 2–3 sample trees were harvested from each clone, yielding a total number of sample trees of 129. A 2-m butt log was obtained from each tree.

The dynamic modulus of elasticity of each green log (E fr) was measured using the longitudinal vibration method (Sobue 1986; Arima et al. 1993). The frequency of the fundamental vibration was obtained with an FFT analyzer (CF-1200, Ono Sokki Co., Ltd., Yokohama, Japan). The weight, log length, and the diameters of both log ends were measured in order to facilitate the calculation of wood density. After measuring E fr, a disk was cut from the top end of each log and then used to measure its near-infrared spectrum. Each disk was cut by chainsaw and the surface used for measurements was processed with a sander.

Measurements of NIR spectra and chemometric analysis

Diffuse reflectance spectra were acquired on a MATRIX-F spectrophotometer (Bruker Optics Co., Tokyo, Japan) equipped with a fiber optic probe (spot diameter ≈3.5 mm). The NIR spectra were obtained at intervals of 8 cm−1 over the wavenumber range 9,000–4,200 cm−1. Thirty-two scans were collected and averaged into a single average spectrum. Spectra were acquired for every fifth annual ring encountered upon moving from the pith to the bark in two arbitrary radial directions, and the mean value of these measurements was used for the regression analysis.

The spectral data were split randomly into calibration and test sets consisting of 65 and 64 samples, respectively. In order to consider the effect of spectral processing, raw (mean-centered), standard normal variate (SNV), and second-derivative spectra were used for the analysis. Second-derivative spectra were obtained using the Savitzky–Golay algorithm with a 21-point window and second-order polynomial. Partial least squares (PLS) regression was used to develop the model to predict E fr (Kramer 1998). The final number of factors selected for incorporation into the model was chosen to minimize the residual variance when using leave-one-out cross-validation. Data analysis was performed using the pls package (Mevik et al. 2013) in R version 3.0.0.

Variable selection

There are numerous suggested methods for variable selection (Mehmood et al. 2012). We utilized the following three methods (Wehrens 2011) to select the spectral regions that are important when attempting to predict E fr.

First, we simply tested the significance of the regression coefficients. Variables with regression coefficients that are not significantly different from zero do not contribute to the predictive abilities of the model. The bias-corrected and accelerated (BCα) interval (Efron and Tibshirani 1993) derived from the bootstrap samples was calculated.

One technique that is often used to control the phenomenon of overfitting is regularization, which involves adding a penalty term to the loss function in order to discourage the coefficients from reaching large values (Bishop 2006). A common choice of loss function used in regression problems is the squared loss, such that regularization introduces the following explicit penalization coefficient:

$$ \mathop {\arg \,\;\hbox{min} }\limits_{\beta } \,\left( {\frac{1}{2}\sum\limits_{i\, = \,1}^{N} {\left( {y_{i} - x_{i}^{T} \beta } \right)^{2} } + \frac{\lambda }{2}\sum\limits_{j\, = \,1}^{M} {\left| {\beta_{j} } \right|^{q} } } \right) , $$
(1)

where y i are the dependent variables, x i are the independent variables, β j are the regression coefficients, N is the number of observations, and M is the number of parameters. λ is the regularization coefficient, which controls the relative importance of the data-dependent error and regularization terms. The cases with q = 1 and q = 2 are known as the Lasso and the ridge regressions, respectively (Tibshirani 1996). This relation has the property that if λ is sufficiently large, some of the β coefficients are driven to zero, leading to a sparse model in which the corresponding variables play no role. A mixture of Lasso and ridge regressions is known as the elastic net (Zou and Hastie 2005), and it involves the following penalty term:

$$ \sum\limits_{j\, = \,1}^{M} {\left( {\alpha \left| {\beta_{j} } \right| + \left( {1 - \alpha } \right)\beta_{j}^{2} } \right)} . $$
(2)

This means that large coefficients are penalized heavily (because of the quadratic term) and that many of the coefficients are exactly zero, leading to a sparse solution. We fitted the Lasso and elastic net models and assessed which of the coefficients are nonzero using the “lars” package (Hastie and Efron 2013) and the “elasticnet” package (Zou and Hastie 2013) in R version 3.0.0.

Estimation of broad-sense heritability

In order to examine the genetic control of wood stiffness as well as NIR spectra, we constructed the following random-effects model:

$$ \begin{aligned} &y_{ij} = \mu + b_{i} + \varepsilon_{ij} , \\ &b_{i} \sim N(0,\,\,\sigma_{\text{b}}^{2} ),\,\,\,\,\,\varepsilon_{ij} \sim N(0,\,\,\sigma^{2} ), \\ \end{aligned} $$
(3)

where y ij is the observed value of the jth tree of the ith clone, μ is the population mean of the target trait, b i is the random effect of clone i with variance \( \sigma_{\text{b}}^{2} \), and ε ij is the random effect of the within-clone variability with variance σ 2. The variances \( \sigma_{\text{b}}^{2} \) and σ 2 were assumed to be independent and normally distributed with a mean of zero. The model was fitted using the “nlme” package (Pinheiro et al. 2013) in R version 3.0.0. The broad-sense heritability (h 2) was calculated as \( {{\sigma_{\text{b}}^{2} } \mathord{\left/ {\vphantom {{\sigma_{\text{b}}^{2} } {\left( {\sigma_{\text{b}}^{2} + \sigma^{2} } \right)}}} \right. \kern-0pt} {\left( {\sigma_{\text{b}}^{2} + \sigma^{2} } \right)}}. \)

Results and discussion

Prediction of wood stiffness

The dynamic modulus of elasticity (E fr) measured using the longitudinal vibration method ranged from 3.44 to 8.64 GPa in the calibration set (mean = 5.60 GPa, SD = 1.19 GPa) and from 3.49 to 7.93 GPa in the test set (mean = 5.59 GPa, SD = 1.14 GPa). Partial least squares analysis was performed to predict E fr using mean-centered raw spectra. The calibration showed a moderate relationship between the measured and NIR-predicted values with a correlation coefficient of 0.66. The calibration model was successfully applied to the test set, leading to a correlation coefficient of 0.69 and a root mean square error of prediction (RMSEP) of 0.82 GPa. Figure 1 shows the relationship between the measured and NIR-predicted values for E fr. Cross-validation revealed that the model needs five PLS factors. The ratio of performance to deviation (RPD, calculated as the ratio of the standard deviation of the reference data to the RMSEP) was good enough, as the initial screening tool showed a ratio of 1.45 (Schimleck et al. 2003).

Fig. 1
figure 1

Plot of measured versus NIR-predicted values of the dynamic modulus of elasticity. Factor, R, and RMSEP indicate the optimum number of PLS factors, the correlation coefficient, and the root mean square error of prediction, respectively. The results were obtained using the test set

There were no significant differences among the spectral pre-processing methods. The correlation coefficients of the prediction model obtained using SNV and second-derivative spectra were 0.65 and 0.67, respectively. Both processing methods yielded RMSEP = 0.84 GPa. The prediction accuracy obtained from this study was slightly superior to those reported previously for Pinus taeda (RMSEP = 1.49 GPa; Kelley et al. 2004), for Pinus sylvestris (RMSEP = 1.46 GPa; Lestander et al. 2008), and for Larix spp. (RMSEP = 1.32 GPa; Fujimoto et al. 2008).

Important variables for wood stiffness

This study confirmed that the calibration model can meet the demand for a technique that can be used to rapidly inspect wood stiffness. As mentioned above, it would be useful for genomic discovery if the NIR spectral bands could be considered the dissected components of the target trait. In this case, we would like to be able to select the spectral regions which make sense physically or chemically. In other words, we need the spectral regions that are closely related to wood stiffness. We examined three methods of detecting the NIR spectral bands that are highly related to wood stiffness: the bootstrap confidence interval, the Lasso model, and the elastic net model. In Fig. 2a, coefficients with a 95 % confidence interval that does not contain zero are shown in black; others are shown in gray. Significant coefficients were not found for the smaller wavenumbers (except for around 4,000 cm−1), indicating that this region contains very little relevant information. Nonzero coefficients obtained from the Lasso and elastic net models are shown in Fig. 2b and c. The locations of the important variables, ~7,320 and ~6,281 cm−1, were similar in both models. The absorption band at 7,320 cm−1 can be assigned to the first overtone of the fundamental CH stretching and deformation vibrational modes due to cellulose, as shown in Table 1 (Schwanninger et al. 2011). The absorption band at 6,281 cm−1 can be assigned to the first stretching overtone of intramolecular hydrogen-bonded OH groups in crystalline regions in cellulose.

Fig. 2
figure 2

Selection of important variables based on a the bootstrap confidence interval, b the Lasso model, and c the elastic net model. In panel a, coefficients for which the 95 % confidence interval (calculated with the BCα bootstrap) does not contain zero are shown in black; others are shown in gray. Panels b and c show the nonzero coefficients in the Lasso and elastic net models (color figure online)

Table 1 Assignment of selected NIR bands and the estimated broad-sense heritabilities for these bands and the wood stiffness

Tsuchikawa et al. (2005) attempted to predict the density and tensile strength of wood from Pinus densiflora and Zelkova serrata, and suggested that the absorption band at 6,281 cm−1 due to intramolecular hydrogen-bonded OH groups in the crystalline regions of cellulose may be strongly linked to the tensile stiffness and strength of the wood. Similar results were found for Larix kaempferi (Fujimoto et al. 2010). Since these results are consistent from physical and chemical perspectives, the NIR bands at 7,320 and 6,281 cm−1 (denoted W1 and W2, respectively, below) were used in the subsequent genetic analysis (Table 1).

Before performing the genetic analysis, we examined the relationships between the stiffness of the wood and the selected NIR bands. Both W1 and W2 were moderately correlated with the measured E fr, with Pearson product-moment correlation coefficients of 0.47 and 0.46, respectively. The relationships between the NIR-predicted E fr and the selected bands were slightly stronger, with correlation coefficients of 0.61 for W1 and 0.58 for W2.

Genetic control of NIR spectral bands

Based on the random-effects model (3), the broad-sense heritability (h 2) for wood stiffness and the NIR spectral bands were calculated. The h 2 of E fr measured by the longitudinal vibration method was 0.74, as opposed to 0.57 when this parameter was predicted from the NIR spectrum (Table 1); these values were consistent with that reported previously for plus-tree clones of sugi (Fujisawa et al. 1992). The heritability estimates for the the NIR absorbance values at W1 and W2 were 0.48 and 0.57, respectively. The heritability estimates for all other spectral bands were also calculated and the results are shown in Fig. 3. The calculations were carried out for three spectral treatments (i.e., raw, SNV, and second derivative). The gray bars indicate the bands selected using the elastic net model. The heritability estimate varied with wavenumber in all spectral treatments. Although the heritability estimate and the bands related to wood stiffness varied with the spectral treatment applied, the highest heritability estimate was found in the vicinity of 6,281 cm−1 in all of the spectral treatments. This band was also selected using the elastic net model in each spectral treatment. These results imply that the NIR spectral bands as well as the stiffness of the wood are under moderate genetic control.

Fig. 3
figure 3

Variations in the heritability estimates for all spectral bands, as calculated from a raw, b SNV, and c second-derivative spectra. The gray bars indicate the bands selected using the elastic net model. The NIR band at 6,281 cm−1 is indicated by an arrow (color figure online)

The relative efficiency of indirect selection (RE) obtained using the NIR spectral bands (i.e., W1 and W2) was calculated using the following formula:

$$ {\text{RE}} = r_{\text{g}} \frac{{h_{2} }}{{h_{1} }}, $$
(4)

where r g is the genetic correlation between trait 1 (E fr) and trait 2 (W1 or W2), and h 1 and h 2 are the square root of the heritability for traits 1 and 2, respectively. The genetic correlations between E fr and W1 and between E fr and W2 were 0.493 and 0.491, so the REs of these combinations were 0.397 and 0.431, respectively. These results indicate that indirect selection based on the NIR spectral bands has 40 % of the accuracy of direct selection of wood stiffness.

In summary, the current study demonstrated that NIR spectroscopy can be used to evaluate the stiffness of wood with reasonable accuracy. Since NIR spectroscopy can evaluate multiple traits simultaneously, it could prove a powerful tool in a diversified breeding program. We selected the spectral regions that were most closely related to wood stiffness using several computational statistical techniques. In a novel approach, we examined the genetic control of each selected spectral band on the assumption that the spectral bands can be considered the dissected components of the target trait (E fr). The spectral bands showed moderate heritability estimates, indicating that these bands are under genetic control, which could prove useful when attempting to perform genomics-based breeding. For instance, the association genetics approach to complex trait dissection requires large population sizes (Neale 2007). NIR spectroscopy can readily supply not only large amounts of sampling data but also abundant variables (i.e., spectral data). In other words, NIR spectroscopy can rapidly yield a number of traits based on our assumption. In this case, the spectral data can be used directly as some traits; alternatively, the data can be decomposed into several latent variables in a statistical analysis such as principal component analysis (PCA). Gonzalez-Martinez et al. (2007) tested the genetic associations among single nucleotide polymorphisms from wood- and drought-related candidate genes and an array of wood property traits—earlywood and latewood specific gravity, percentage of latewood, earlywood microfibril angle, and wood chemistry (lignin and cellulose content)—using both the original data and these synthetic principal components. The selected spectral regions that make sense physically or chemically varied according to the spectral treatment applied. The second-derivative treatment selected more spectral bands than the raw and SNV spectral treatments. Further studies are therefore required to determine the most suitable procedure for selecting the most important spectral bands.