Keywords

1 Introduction

Accuracy of forest resource information has a decisive impact to decision making concerning forest management and wood procurement. Forest inventory information is used in decision support systems that are further used for making computations of the current state of the forest holding as well as future projections. Errors in input data for the execution of long model chains increase in magnitude and cause significant output errors, e.g. when forest management plans are updated (e.g. Ojansuu et al. 2002; Haara and Korhonen 2004; Haara 2005; Vastaranta et al. 2010; Holopainen et al. 2010a, b). The longer the reference period, the larger the output errors; thus, inaccurate input data are especially problematic in forestry yield value determination. In addition, inaccurate input data cause significant losses in forest planning and forest silviculture as the timing of various treatments starts to differ from the optimal timings (e.g. Eid 2000; Eid et al. 2004; Holopainen and Talvitie 2006; Holopainen et al. 2009).

Airborne laser scanning (ALS) has been generalized as a technique used for forest inventory with an aim for collecting information for forest planning. The applied method is known as area-based approach (ABA). In ABA, ALS data are used to generalize field-measured forest inventory attributes over an entire inventory area. ABA has provided accuracies ranging between 10 and 27 % for the mean stem volume at stand- or plot-level (e.g. Næsset et al. 2004; White et al. 2013). In ABA, forest inventory attributes, such as species-specific stem volume (V), basal-area (G), Lorey’s height (Hg), basal-area weighted mean diameter (Dg), and stem number (N) are predicted for grid cells covering the entire inventory area. However, models used in forest-planning simulation (including attribute update) require measured or estimated stem diameter distributions that are not directly inventoried. Thus, stem diameter distributions are typically formed by predicting/recovering parameters of some theoretical distribution function such as the probability functions of beta, Weibull or Johnson SB distributions based on the forest inventory attributes (Kilkki et al. 1989; Maltamo and Kangas 1998; Siipilehto 1999; Kangas and Maltamo 2000; Holopainen et al. 2010c). Then, forest growth modelling is done at tree-level by using growth models for basal-area and height (Hynynen et al. 2002). In this way, forest inventory attribute updating systems that are based on tree-level models are subject to (1) inventory errors, (2) errors in the predicted stem diameter distribution, and (3) errors in the growth modelling.

Our objective was to analyse the effects of these error sources on the short-term forest inventory attribute update in boreal managed forest conditions. The analyses of the error sources were partitioned into two parts. The first part dealt with the errors related to the forest inventory using ABA. The second part dealt with the effect of stem distribution prediction error and growth modelling error. The overall aim related to the study is to develop methods for updating grid-level forest inventory attributes for forest management planning purposes.

2 Materials and Methods

2.1 Study Area

The 5 by 5 km study area is located in Evo, southern Finland which belongs to the southern boreal forest zone. It consists of approximately 2000 ha of mainly managed boreal forest having an average stand size slightly less than 1 ha. The elevation of the area varies from 125 to 185 m above sea level. Scots pine (Pinus sylvestris L.) and Norway spruce [Picea abies (L.) H. Karst.] were the dominant tree species in the study area contributing 49 and 28 % of the total stem volume, respectively. The share of deciduous trees was 23 % of the total stem volume (Fig. 1).

Fig. 1
figure 1

Map of the study area containing the modelling (n = 246) and validation (n = 34) plots used in the study

2.2 Field Data from 2007 and 2014

Field measurements were undertaken in summer 2007 on 246 circular plots (modelling plots) with 9.77 m radius. The modelling plots were selected based on pre-stratification of existing stand inventory data (Kankare et al. 2013). All trees having a diameter-at-breast-height (DBH) of over 5 cm were tallied and tree height, DBH, and species were recorded. Tree heights were measured using Vertex clinometers as DBH was measured with steel callipers. The stem volumes were calculated with standard Finnish species-specific stem volume models (Laasasenaho 1982). The plot-level data were obtained by summing the tree data. From the 246 modelling plots, a further sample of 34 plots was selected in year 2014 to be used as validation plots in this study. The validation plots were distributed over the study area among the modelling plots to cover all the various site types, stand development classes, and tree species. The unnatural changes to modelling plots, such as clear-cuts or thinnings, limited the number of validation plots available. The descriptive statistics of modelling plots (n = 246) and validation plots (n = 34) are summarized in Table 1. The plot centres were measured with a Trimble’s GEOXM 2005 Global Positioning System (GPS) device (Trimble Navigation Ltd., Sunnyvale, CA, USA), and the locations were post-processed with local base station data, resulting in an average error of app. 0.6 m.

Table 1 Field inventoried mean attributes of modelling plots (n = 246) and validation plots (n = 34) from 2007

The 34 validation plots were re-measured in 2014 with the exactly similar plot set up as year 2007. Again all trees on the plot with DBH over 5 cm were measured and DBH, tree height and species were recorded. The sample plots were located based on the recorded coordinates for the plot centres from 2007 measurements. The plot centres were even marked with signposts during the 2007 measurements so that the exact plot centre could be found for re-measurements. The descriptive statistics for sample plots measured on year 2014 are shown in Table 2.

Table 2 Field inventoried mean attributes of validation plots (n = 34) from 2014

2.3 Remote Sensing-Based Forest Inventory from 2007

The remote sensing data were collected in midsummer 2006. ALS was performed using Optech ALTM3100C-EA system operating with a pulse rate of 100 kHz. Data were acquired at a flight altitude of 1900 m resulting in an average pulse density of 1.3 pulses per square meter in non-overlapping areas and a footprint of 70 cm in diameter. The system was configured to record up to four returns per pulse, i.e. first, last, only, and intermediate. Reported positioning accuracy was 40 cm and 15 cm for horizontal and vertical direction respectively. Same-date aerial photographs were obtained with a digital camera and the photographs were orthorectified, resampled to pixel size of 0.5 m and mosaicked to a single image covering the entire data. Near-infrared (NIR), red (R) and green (G) bands were available.

ALS data were first classified into ground or non-ground points using the TerraScan (TerraSolid, Helsinki, Finland) based on the method explained in Axelsson (2000). A digital terrain model (DTM) was then calculated using classified ground points. Laser heights above ground (normalized height or canopy height) were calculated by subtracting ground elevation from corresponding laser measurements. The expected accuracy of the ALS-derived DTM varies in boreal forest conditions by around 10–50 cm (Hyyppä et al. 2009). Canopy heights closed to zero are the ground returns and those greater than 2 m are considered as vegetation returns. The data between them are considered as returns from ground vegetation or bushes. Only the returns from vegetation were used for feature extraction. Statistical metrics describing canopy structure were extracted for the sample plots (radius 9.77 m) following suggestions by White et al. (2013). Also several statistical and textural features were extracted from the aerial photographs, such as the means and standard deviations of spectral values (Holopainen et al. 2008). The Haralick textural features (Haralick et al. 1973; Haralick 1979) were derived from the spectral values.

Species specific basal area (G), basal area-weighted mean diameter (Dg), Lorey’s height (Hg), stem volume (V), and number of stems per hectare (N) were predicted by means of remote sensing metrics using random forest (RF, Breiman 2001) based k nearest-neighbor (NN) approach. Forest inventory attributes measured in the field were used as target observations, and plot-specific metrics derived from remote sensing data sets were used as predictors. The RF approach was applied in the search of nearest neighbors. In the RF method, several regression trees are generated by drawing a replacement from two-thirds of the data for training and one-third for testing for each tree. The samples that are not included in training are called out-of-bag samples, and they can act as a testing set in the approach. The measure of nearness in RF is defined based on the observational probability of ending up in the same terminal node in classification. The R statistical computing environment (R Core Team) and yaImpute library  (Crookston and Finley 2008) were applied in the predictions. In the present study, 1200 regression trees were generated, and the square root of the number of predictor variables was picked randomly at the nodes of each regression tree. The number of neighbors was set to one to keep the original variance in the data (see, e.g. Hudak et al. 2008; Franco-Lopez et al. 2001). Prior to the final modeling, RF was used to reduce the number of predictor variables. The aim of the variable reduction was to build up parsimonious models that are capable of accurate prediction. In the variable selection, RF iterated 100 times per model and the best variables based on their importance for each model were selected. Then, only the most important variables based on the results were used for the final imputations. The used predictors were the vegetation ratio from first and last pulses, the heights where 30 and 90 % of first laser returns and 30 % of last returns had been received, mean height in the pixel window, local homogeneity 90° of height, the average NIR and standard deviation of NIR.

To improve the accuracy of the species specific estimates, the sample plots were divided into four strata according to existing stand register information. The first stratum included Scots pine dominated stands, the second stratum Norway spruce dominated stands, the third stratum included stands dominated by deciduous trees and the fourth stratum had stands with approximately equal share of pine and spruce trees with a mixture of deciduous trees. The first stratum comprised 92 sample plots, the second 56, the third 41 and the fourth 57 sample plots respectively. The final imputations were carried out for each stratum separately.

2.4 Simulation of Forest Growth

The forest attribute update calculations from 2007 to 2014 were carried out using SIMO software (SIMO simulation framework, Rasinmäki et al. 2009). SIMO is a common platform for various stand simulators including Finnish tree- and stand-level simulators. The simulation logic is described in XML documents (eXtensible Markup Language) and lends itself to be easily adapted for various types of calculations. The non-spatial tree-level growth models found in SIMO are, for the most part, the same as those found in the MELA2002 and MOTTI simulators (Hynynen et al. 2002; Salminen et al. 2005). They include growth models for all sites and tree species in Finland, including separate models for peatlands. The tree-level simulator can be used to simulate the growth of either sample trees measured in the field or descriptive trees generated on the basis of a theoretical diameter/height distribution. The simulation is performed at the single-tree level. The statistics for the strata and stands are derived as the sums and means of the simulated tree properties.

2.5 Evaluation of the Errors

The accuracy of the ABA and updated stem volumes estimates was evaluated by calculating bias and root-mean-square error (RMSE) for three different alternatives (Table 3):

$$BIAS = \frac{{\sum\nolimits_{i = 1}^{n} {(y_{i} - \hat{y}_{i} )} }}{n}\quad BIAS\% = 100*\frac{BIAS}{{\bar{y}}}$$
$$RMSE = \sqrt {\frac{{\sum\nolimits_{i = 1}^{n} {(y_{i} - \hat{y}_{i} )^{2} } }}{n}} \quad RMSE\% = 100*\frac{RMSE}{{\bar{y}}}$$

where n is the number of plots, y i is the observed value (by tree-wise measurements from 2014) for plot i, \(\hat{y}_{i}\) is updated attribute for plot i and \(\bar{y}_{i}\) is the observed mean of the species-specific—or total stem volume.

Table 3 Principles used in the error analyses

3 Results and Discussion

The results from the remote sensing data based prediction of forest inventory attributes in year 2007 are presented in Table 4. For the validation plots (n = 34) the empirical 95 % interval of total stem volume was between 42.4 and 431.2 m3/ha. The species specific empirical 95 % intervals for stem volume were for pine from 0 to 266.4 m3/ha, for spruce from 0 to 239.0 m3/ha and for deciduous trees from 0.5 to 225.7 m3/ha, respectively.

Table 4 Mean values for predicted forest inventory attributes using ABA

The RMSE of forest inventory for total stem volume was 25.2 % as the bias was 8.5 % (Table 5). Species-specific RMSEs and biases varied from 80.0 to 134.3 % and from −0.5 to 21.3 %, respectively. At the sample plot-level the range in inventory error (difference) was from −83.6 to 167.4 m3/ha (Fig. 2). Based on Hudak et al. (2008) and Franco-Lopez et al. (2001) increasing the number of neighbors would improve the prediction accuracy. However, inventory RMSEs are in line with the previous studies in the same study area (Holopainen et al. 2010a; Yu et al. 2010; Vastaranta et al. 2011, 2012, 2013; Kankare et al. 2015). Controversially, ABA inventory in this study included bias. Bias can be resulted from the rather limited number of validation plots (n = 34) as well as from slight differences in forest inventory attributes measured from modelling plots used in ABA compared to validation plots (see Table 1). For example, the mean stem volume was 230.4 m3/ha in the validation plots ranging from 54.7 to 575.4 m3/ha as the respective numbers from modelling plots were 186.6 m3/ha (mean), 0 m3/ha (min) and 575.4 m3/ha (max). To avoid more bias number of nearest neighbors was chosen to be 1.

Table 5 Effect of inventory error on predicted species-specific stem volumes as well as on the total stem volume (V) on the validation plots
Fig. 2
figure 2

Field measured stem volume (m3/ha) (2007) compared to stem volume estimate based on ABA (2007)

Prediction of stem diameter distribution and growth modelling errors caused 6.7 % bias and 18.8 % RMSE to the updated stem volume. Species-specific RMSEs and biases varied from 23.1 to 65.9 % and from 5.5 to 9.0 %, respectively. The RMSEs are lower than the ones for the ABA forest inventory of year 2007. Based on the previous studies (Vastaranta et al. 2010; Holopainen et al. 2010c) it can be assumed that the majority of this error is caused by the growth modelling and only a minor component from the generated stem distribution. Although, error of predicting stem diameter distribution cannot be separated from the growth modelling error in this study, it has been shown that its effect is marginal in this kind of study design (e.g. Holopainen et al. 2010c). At the sample plot-level the range in error of prediction of stem distribution and growth modelling error (difference) was from −134.7 to 93.7 m3/ha (Fig. 3).

Fig. 3
figure 3

Field measured stem volume (m3/ha) (2014) compared to field measured stem volume from 2007 updated to year 2014. The update was done by utilizing growth models

Combined error of forest inventory, prediction of theoretical stem distribution and forest growth modelling caused 13.1 % bias and 24.6 % RMSE to the updated stem volume. Species-specific RMSEs and biases varied from 65.8 to 109.2 % and from 3.7 to 26.7 %, respectively. At the sample plot-level the range in combined errors was from −95.3 to 156.8 m3/ha (Fig. 4).

Fig. 4
figure 4

Field measured stem volume (m3/ha) (2014) compared to stem volume estimate based on ABA from 2007 updated to year 2014. The update was done by utilizing growth models

Compared to attribute update from error free data (errors of prediction of stem distribution and growth modelling), it can be seen that biases are 5–15 % points larger for total stem volume as well as for species specific stem volumes when all the error sources are combined. Similarly, RMSE for total stem volume is roughly 10 % points larger. Species-specific errors increase more. Accuracy of the species-specific stem volumes is ranging from 80.0 to 134.3 % with ABA (inventory error) and thus it can be expected that these errors shift to outputs of the update process.

4 Conclusion

The objective here was to analyse the effects of error sources on the short-term forest inventory attribute update in boreal managed forest conditions. The analyses of the error sources were partitioned into two parts. The first part dealt with the errors related to the forest inventory using ABA. The second part dealt with the effect of stem distribution prediction error and growth modelling error. The results showed that prediction of theoretical stem distribution and forest growth modelling affected only slightly to the quality of the predicted stem volume in short-term information update. The results of our study confirm that the quality of the input data is the most effective error source in short-term forest information update. Thus, further studies are required especially for obtaining species-specific forest inventory information more accurately.