Introduction

The pulp and paper industry plays an important role in Thailand. Eucalyptus camaldulensis is utilized as the main raw material due to its high growth rate. In order to improve the productivity of pulp and paper mills, the supply of wood chips of stable quality in terms of lignin and cellulose content, wood density, length of fiber, etc. is required. The chemical and inherent physical properties of Eucalyptus sp. have especially wide varieties that depend on the species, plantation site, tree age, and individual specimens (Hillis and Brown 1984).

Within the last two decades, near-infrared (NIR) spectroscopy has been recognized as one of the most powerful non-destructive techniques for wood analysis. Usually, multivariate statistical methods such as principal component analysis (PCA), principal component regression (PCR), and partial least square regression (PLS-R) are used for the evaluation of NIR spectra and for the calculation of prediction models. Many applications have been published and were summed up in a few reviews (Workman 1999, 2001; Tsuchikawa 2007).

Basic density is a common measure of wood density and is a very important pulpwood parameter, influencing many aspects of the pulp and paper industry including freight costs, chipping properties, pulp yield per unit mass of wood, and paper quality and therefore it is an integral part of tree breeding programs. Density and fiber length are used to evaluate the suitability of a wood for a particular application. Properties such as basic density, fiber length, cell wall thickness, whiteness, and quality of extractives are used by the pulping industry as indicators of wood quality for different industrial processes and final paper products. Basic density is related to the yield, paper resistance, optical properties, and surface quality. In hardwoods, density is largely controlled by the ratio of fiber diameter to wall thickness (Higgins et al. 1973) but is influenced by a number of factors including fiber wall thickness, quantity of parenchyma tissue, non-structural material (e.g. extractives), and the frequency and size of vessels (Hillis 1972). In tree breeding programs, where many samples have to be assessed, reliable, fast, and preferably non-destructive methods such as NIR are required to be able to handle the large number of samples. Several studies have in principal proven this (Gindl et al. 2001; Schimleck et al. 2001a, b, 2006a, b; Hein et al. 2009; Tyson et al. 2009; Yao and Pu 2009) from which NIR spectroscopy can conclusively be identified as a useful technique for pulp and paper industries.

Differences in basic density and fiber length are observed between species, sites, trees, and within the tree. For the genus Eucalyptus, the most common trend found for basic density is an increase radially from pith to bark and axially from the base to apex, and an initial decrease in basic density in the base of the tree has been described in more detailed axial sampling investigating within-tree variation in fiber length and coarseness of Eucalyptus globulus and E. nitens to develop a non-destructive sampling strategy (Muneri and Raymond 2001). Basic density and fiber length at breast height of E. globulus for parameter prediction of the whole tree (Igartúa et al. 2003), the variation of fiber biometry in different provenances of E. globulus (Miranda et al. 2001), the effect of harvesting age on the quality of E. camaldulensis bleached kraft pulp (Pisuttipiched et al. 2003), as well as the clonal and site variations of vessels in E. globulus (Leal et al. 2003) were investigated. Fiber length increases radially from pith to bark, but axial variation is less consistent.

In a previous work, pulp yield, alpha-cellulose content, pentosans content, and lignin content of E. camaldulensis were investigated (Terdwongworakul et al. 2005).

The purpose of this study was to further investigate the feasibility of predicting air-dry density and fiber length using NIR spectroscopy combined with partial least squares regression (PLS-R) as a selection criterion in the tree breeding and high-quality pulp wood selection programs of E. camaldulensis tree plantations in Thailand. It was aimed at obtaining better models than those found in literature which according to AACC Method 39-00 (AACC 1999) are not precise enough for screening.

Materials and methods

Materials

Five disks (from the following heights: 3.4, 6.9, 9.4, 13.1, and 15.4 m) were cut out from a 5-year-old E. camaldulensis tree. The disks were divided into several sticks, which were further cut to small blocks (25(R) mm × 25(T) mm × 50(L) mm) and labeled e.g. A to E from bottom to top. In total, 50 clear wood samples, namely 24, 16, 4, 4, 2 samples from 3.4, 6.9, 9.4, 13.1, 15.4 m height disk, respectively, were obtained. These samples were further cut to small strips (5 mm × 10(T) mm × 5 mm) and sanded with a 60-mesh sand paper. Wood samples were kept in a chamber at 25°C and RH = 70% to simulate air-dry conditions. Mostly identical samples were used for the measurement of NIR spectra, density, and fiber length.

NIR measurements

A NIR diffuse reflectance spectrum was collected for each sample using a FT-NIR spectrophotometer (Bruker MATRIX-F; TE-InGaAs detector with a fiber optic probe with a measurement area of 49 mm2. A white plate (barium sulfate) served as background. In order to improve the signal-to-noise ratio, 128 scans were co-added at spectral resolution of 8 cm−1 over the wavenumber range 10,000–4,000 cm−1. A zero-filling of two was applied.

Determination of air-dry density

Air-dry density was determined for 45 samples (sample set A) by mercury displacement (Kollmann and Coté 1968) method using UX420H (Shimadzu).

Fiber length measurement

Fiber lengths were measured after air-dried density measurement. Each of the 24 specimens (sample set B) was treated with Schulze solution and NaOHaq (Cheng et al. 2000) and then fiberized. The fiber length was measured using a microscope (Olympus BX60F-3). The lengths of 60 fibers were averaged for each specimen.

Both fiber length and air-dry density are only available for 19 samples that are further called sample set C.

Data processing: multivariate data analysis

Partial least squares regression modeling

OPUS Quant 2 software was used for data pre-processing and for the calculation and validation of the PLS-R models. Spectra were processed (smoothed and derived) by means of 17 points smoothing filter and a second-order polynomial to obtain first derivatives (1st der), and vector normalized (VN) and multiplicative scatter correction (MSC) were applied in the wavenumber regions used for the calculation of the PLS-R models with OPUS software (version 6.5, http://www.brukeroptics.de). For calibration/cross-validation, the infrared data sets were regressed against the property of interest. The optimum number of PLS components (Rank) was determined by full inner cross-validation (CV all) method (leave on out). Calibration and cross-validation were evaluated by the coefficient of determination (R 2 for calibration and r 2 for cross-validation) between predicted and measured values, and root mean square error of cross-validation (RMSECV), respectively. The cross-validation (CV) sets were divided into CV sets and test sets using 10% of the samples that cover the ranges for test sets to perform cross-validations and test set validations. For each test set, the r 2 and root mean square error of prediction (RMSEP) were calculated. Moreover, the number of samples left out during cross-validation of the air-dry density model was increased up to 42% of the samples to obtain more realistic results compared to those from full cross-validation. The two outliers found during this procedure were removed, and the cross-validation was repeated without them.

The RPD residual prediction deviation or ratio of performance to deviation was introduced by Williams and Norris (2004) several years ago and is calculated as the ratio of two standard deviations; the standard deviation of the reference data for the validation set and the standard error of prediction (from cross-validation or test set validation).

Results and discussion

Eucalyptus camaldulensis properties

Density and fiber length are important parameters used by the pulping industry as indicators of wood quality for different industrial processes and final paper products. As it was shown by Hamilton et al. (2008) that a very good correlation between basic density and air-dry density exists, the latter one was determined here because the NIR spectra were collected from air-dry wood. A summary of these parameters of the investigated samples is listed in Table 1.

Table 1 Statistical results of Eucalyptus camaldulensis properties

The correlations between (A) fiber length and air-dry density, (B) air-dry density and sampling height, and (C) fiber length and sampling height are shown in Fig. 1. The correlation between fiber length and air-dry density was investigated in detail for the wood disk taken at 3.4 m height (Fig. 1a), which is based on sample set C. Although the variation of the fiber length is big, clear negative correlations between fiber length and air-dry density were observed with correlation coefficients of −0.59 for all samples and −0.81 for the average values at average heights (Fig. 1a). Air-dry density also shows a negative correlation with sampling height (Fig. 1b) both for 19 average values (of sample set A) of the same heights and for the averaged values of each disk. Fiber length increased with increasing sampling height (Fig. 1c).

Fig. 1
figure 1

Correlation between a fiber length and air-dry density, and the average fiber length of a sampling height with the corresponding average density. The average value for fiber length and air-dry density are not in line with the single values, because they were taken at slightly different heights within a wood block meaning from e.g. positions A to E (cp. “Material and methods/material”). Correlation between b air-dry density and sampling height, and c fiber length and sampling height. Filled rectangle and solid line show the result for all samples, open diamond and dashed line show the result for the average value for each sampling height. The average values are shown at the average heights of the samples

Within-tree variation of fiber length in Eucalyptus globulus and E. nitens studied by Muneri and Raymond (2001) revealed that fiber length increased from the base of the tree and then plateaued before decreasing toward the apex, reaching a minimum at the 70% height of tree, but axial variation is less consistent. The result by Muneri and Raymond (2001) that fiber length increases with tree height could be confirmed for the narrow range of height investigated and shown in the present study.

The most common trend found for basic density for the genus Eucalyptus is an increase radially from pith to bark and axially from the base to apex (Wilkes 1988; Valente et al. 1992; Igartúa et al. 2003). This is in opposite to the findings here, where the air-dry density decreases with increasing height. An initial decrease in basic density in the base of the tree has been described in more detailed axial samplings (Raymond and MacDonald 1998; Muneri and Raymond 2001; Raymond and Muneri 2001).

The knowledge of the density and fiber length distribution within a tree is important to decide at which height sampling should be taken in order to obtain a sample that represents the whole tree. Several investigations were performed to find a representative sampling height. Some authors concluded that sampling tree heights other than breast height was better for predicting basic density and fiber length of the whole tree (Raymond et al. 1998; Raymond and MacDonald 1998; Hudson et al. 2001; Muneri and Raymond 2001; Raymond and Muneri 2001).

As an optimal sampling height for E. camaldulensis is still unknown, many samples have to be investigated. To be able to handle the high number of samples during tree breeding, alternative methods such as FT-NIR are required.

NIR measurements

Figure 2 shows NIR diffuse reflectance spectra of the air-dry E. camaldulensis solid wood samples with the highest and lowest air-dry density and fiber length, respectively. As usual, for NIR, their spectral characteristics are very similar on a first view except for the baseline differences among the samples. Baseline differences between samples are usually observed in the case of diffuse reflectance spectra not only due to differences in the sample composition but also as part of the sample inhomogeneities.

Fig. 2
figure 2

NIR diffuse reflectance spectra of the air-dry E. camaldulensis solid wood samples with the highest and lowest air-dry density and fiber length, respectively

Partial least squares regression models

PLS regression models were developed for the determination of air-dry density and fiber length using NIR spectra as dependent variable. Relationships between predicted and measured values for calibration and cross-validation (CV all) are shown in Fig. 3. The wavenumber ranges and the pre-processing methods used for the NIR spectra as well as the rank obtained from PLS regression analysis and statistical results are summarized in Table 2. Wavenumber range and pre-processing of NIR spectra were determined from several combinations to give high coefficients of determination. The rank was determined using full inner cross-validation. Calibration results are not shown in Table 2, because they are always better than the cross-validation results (cp. Fig. 3) due to the principals of PLS-R and they are not as informative as the cross-validation results.

Fig. 3
figure 3

Relationships between NIR predicted and measured values of air-dry density a calibration and b cross-validation of sample set A, and fiber length calibration c and cross-validation d of sample set B. Solid line shows the regression line constructed by PLS regression analysis and dashed line shows target line

Table 2 Cross-validation and test set validation results of PLS regression analyses

For cross-validation, high coefficients of determination between predicted and measured values were obtained for air-dry density r 2 = 0.91 and fiber length r 2 = 0.92. Although the fiber length and the range of the fiber lengths of Eucalyptus is much smaller than that of softwood (Fengel and Wegener 1989), and is therefore expected to be more difficult to predict, the coefficients of determinations obtained are better than those already published for softwood (Hauksson et al. 2001; Schimleck et al. 2004; Sykes et al. 2005). The root mean square error of cross-validation (RMSECV) of both PLS regression models were adequately small.

In several studies, the feasibility of NIR spectroscopy for the non-destructive evaluation of Eucalyptus density was investigated. The coefficients of determinations obtained in these studies (Schimleck et al. 2001a; Schimleck and French 2002) were lower or the root mean square error of validation was higher (31 kg/m3) (Schimleck et al. 2001b) compared to the ones obtained here. The results not only demonstrate again that NIR spectroscopy has the potential to predict air-dry density and fiber length of Eucalyptus solid wood with high accuracy and precision, they also show that still an improvement of PLS-R models was possible compared to already published results. As the standard deviations of wood parameters are different in each study, the residual prediction deviations (RPD) were calculated to take this into account and to be able to compare the PLS regression result with those from previous studies directly.

From the analytical point of view, in accordance with AACC Method 39-00 (AACC 1999), the RPD should be in the following range: ≥2.5 screening in breeding programs; ≥5 acceptable for quality control; ≥8 good for process control, development, and applied research (AACC 1999). It has to be kept in mind that the RPD is only correct and comparable when the data are normally distributed.

The RPDs (Table 2) obtained for the NIR-PLS-R models, 3.3 for air-dry density and 3.5 for fiber length, allow drawing the conclusion that the models are at least applicable for screening in breeding programs as they lie in-between screening (RPD ≥ 2.5) and quality control (RPD ≥ 5).

The RPDs for the prediction of density obtained in previous studies ranged from 1.7 (Tyson et al. 2009) over 2 (Schimleck et al. 2006a) up to 2.5 (Hein et al. 2009), whereas only the latter one fulfills the requirements for screening according to AACC Method 39-00 (AACC 1999).

The higher the number of samples, the more “optimistic” is the result obtained by full cross-validation, especially when the number of samples is very high, the results are too “optimistic. To confirm the results, normally two procedures are applied (a) to use a test set and (b) to increase the number of samples left out during cross-validation. The latter one additionally proofs the stability of a model.

Although the number of samples available in this study was not so high, the sample sets were divided into cross-validation sets and test sets using 10% of the samples for the test sets to obtain more realistic results of the model performances. The cross-validation results decreased a little bit; the results obtained by test set validation for both air-dry density and fiber length are better than the first results obtained by full cross-validation using all samples (CV all) (Table 2). This confirms the good results obtained by cross-validation using all samples. Compared to CV, all the RPD for air-dry density (fiber length) increased from 3.3 to 3.9 (3.5–3.9) and the error decreased from 0.019 g/cm3 (0.015 mm) to 0.017 g/cm3 (0.012 mm).

Cross-validation with one sample omitted (leave-out-one cross-validation) was performed for the validation of all models. A large number of cancellation groups correspond to validation with a small perturbation of the statistical sample, whereas a small number of cancellation groups correspond to a heavy perturbation. Using a large number of samples combined with the leave-out-one method for cross-validation, the perturbation of the model at each step will be small. This procedure tends to “overfit” the model and often leads to too “optimistic” modeling results. For this reason, the leave-more-out method for cross-validation may be preferable. Therefore, the number of samples left out during cross-validation of the air-dry density model was increased up to 42% to obtain more realistic results. Even when 40% of the samples were removed in CV, the r 2 is still 0.9, the RMSECV is only 0.02 g/cm3 (Fig. 4a), and the RPD is 3.2 (Fig. 4b). The robustness of the model seems to be proven and it can be concluded that the model is stable and well qualified for prediction.

Fig. 4
figure 4

a Relationships between r 2 (filled circles) and RMSECV (open circles), respectively, and the percentage of samples left out in cross-validation (CV) for air-dry density (sample set A) using 8 PLS components. b RPD versus the percentage of samples left out in cross-validation. Two outliers were removed in the CV leaving out more samples

The good model statistics obtained in this study might be due to the fact that measurement sites for the measurement of NIR spectra, density, and fiber lengths were strictly coincided.

Conclusion

The results not only demonstrate again that NIR spectroscopy has the potential to predict the air-dry density and fiber length of E. camaldulensis solid wood with high accuracy and precision, they also show that still an improvement of PLS-R models was possible compared to already published results. To the best knowledge of the authors, the results are the first that fully fulfill the requirements of AACC Method 39-00 for screening in breeding programs (RPD ≥ 2.5) with RPDs for the NIR-PLS-R models of 3.3 for air-dry density and 3.5 for fiber length. This allows drawing the conclusion that the models are at least applicable for screening in breeding programs as they lie in-between screening (RPD ≥ 2.5) and quality control (RPD ≥ 5).

Further investigations including more trees and/or also trees from other Eucalyptus sp. will reveal if similar good results can be obtained. Based on the test set validation results and the results obtained by increasing the percentage of samples left out during cross-validation of the air-dry density, samples up to 42% (which has proven the stability of the model) suggest that this goal could be reached confirming the applicability of NIR spectroscopy for quality control.