1 Introduction

Sweet chestnut (Castanea sativa Mill.) covers more than 2.5 million hectares in Europe, with a distribution reaching from the Southern Mediterranean to central, Atlantic, and Eastern Europe (Conedera et al. 2004). Chestnut forests have been recognized as habitats of interest in the European Natura 2000 network and are considered characteristic cultural landscapes of the Mediterranean and Atlantic regions (Díaz Varela et al. 2009). In northwest Spain, chestnut is the most important forest species, covering over 100,000 ha, mainly as coppice stands (DGCONA 2013). This area accounts for over 95 % of the potential area for chestnut coppice stands in Spain.

Although chestnut fruit production has traditionally driven management in the region, changes in markets and local economies have resulted in timber production becoming the main objective in most exploitation nowadays (Álvarez et al. 2000). The vitality of the chestnut root system, with stools capable of sustainably producing an abundance of shoots, and high productivity (8–16 m3 ha−1 year−1 depending on site conditions) facilitate management under a coppice system (Giudici et al. 2000). Chestnut coppice produces valuable timber in relatively short rotations (20–40 years) compared to other hardwoods (Gallardo et al. 2000; Kerr and Evans 1993). The total volume (with bark) of sweet chestnut stands (high forest and coppice stands together) harvested in Spain during 2011 was 58,090 m3 (MARM 2011), with more than 42.46 % of this total volume being formed by trees from coppice stands in northwest Spain.

Estimating timber volume stocks as accurately as possible is essential in forest management. It is therefore necessary to develop tools that allow the reliable estimation of tree volume using variables which are easy to measure in the field, such as diameter at breast height (D) and total height (H). One such tool is individual tree volume equations. However, these equations have the disadvantage of not being able to predict tree volume for wood products which are classified by merchantable size depending on log dimensions.

There are a number of ways to address this issue, the two most important of which are developing volume-ratio equations that predict merchantable volume as a percentage of total volume (Burkhart 1977; Clutter 1980; Reed and Green 1984) or using taper functions.

Taper functions describe stem taper (Brink and Gadow 1986; Kozak 1988; Riemer et al. 1995) and provide forest managers with estimates of (a) diameter at any point along the stem, (b) total stem volume, (c) merchantable volume and merchantable height to any top diameter and from any stump height, and (d) individual volumes for logs of any length at any height above the ground (Kozak 2004). Such functions can be implemented in different computer software specially developed for this type of calculation, such as GesMO (Diéguez-Aranda et al. 2009) or CubiFOR (Rodríguez et al. 2008). To develop this type of function, it is necessary to have a longitudinal data structure, that is, multiple measurements for each individual (Lindstrom and Bates 1990).

Ideally, a taper equation should be compatible, meaning that the volume computed by integration of the taper function should be equal to that calculated by a total volume equation (Clutter 1980; Demaerschalk 1972; Fang et al. 2000). Examples of compatible volume-estimating systems are the works carried out by Demaerschalk (1972), Goulding and Murray (1976), and Fang et al. (2000).

Prediction tools are essential to understand the development of forest stands and subsequently decide on the best management strategy. In Spain, many taper functions have been developed for different forest species (Barrio-Anta et al. 2007; Crecente-Campo et al. 2009; Diéguez-Aranda et al. 2006); however, there is currently no taper function available for chestnut coppice, either in Spain or elsewhere in the world. This work is a result of looking to remedy this gap in provision, and its main objective is to develop a taper function able to correctly describe the profile of and ensure appropriate estimates of stem volume using chestnut coppice stands in northwest Spain as a baseline. Specifically, we wish to focus on two questions: (a) Is it possible to correctly describe the huge variability of stem profiles in chestnut coppice stands given the high number of stems which may grow from a single stool, and (b) which model best describes this type of profile and its variability?

2 Material and methods

2.1 Data

The data used in this study were collected in 70 coppice stands covering the existing range of ages, stand densities, and sites of this species in the region. Figure 1 shows the map with the locations of the stands used for the fitting data.

Fig. 1
figure 1

Map showing cover rates for chestnut coppice stands in the study area. Fitting plots are indicated by red dots

A total of 203 trees were felled and destructively sampled. Trees had to be healthy and of a standard shape (i.e., not forked nor excessively branched) and were selected in order to ensure a representative distribution of diameter and height classes (Table 1).

Table 1 Descriptive statistics of tree and stand data sets used in the analysis for fitting and validation

Before felling, diameter at breast height D (diameter at 1.3 m above the top of the stool, in cm) was measured to the nearest 0.1 cm for each tree. The trees were then felled and total bole length, that is, total height H, (in m) measured to the nearest 0.1 m. The trees were cut into 1-m logs, up to a top diameter of 7 cm, and measured to the nearest centimeter. Two perpendicular over bark diameters (d, cm) and two perpendicular bark thicknesses were measured to the nearest 0.1 cm in each cross section (at height h, in m, above the top of the stool). Over bark log volumes were calculated in cubic meters using Smalian’s formula, and the top section was treated as a cone. Over bark total stem volume was obtained by summing the over bark log volumes and the volume of the top section. Finally, 3,282 pairs of diameter (d) at a certain height (h) measurements were used for the original fitting data set.

Data from an independent network of plots (established by the Atlantic Forest Systems Research Group (GIS-Forest), Department of Organisms and Systems Biology, University of Oviedo) was used for validation purposes. The height-diameter distributions for the fitting and validation samples are very similar (Fig. 2), indicating that robust conclusions can be reached from the validation analysis.

Fig. 2
figure 2

Plot of diameter at breast height against total height of sampled trees (black dot) and validation sample (multiplication sign)

The scatterplot of relative diameter (d/D) against relative height (h/H) was examined visually to detect possible anomalies in the data. This first analysis detected a number of outliers (many of them corresponded to trees with abnormalities) which were removed. A second analysis was carried out with the systematic procedure proposed by Bi (2000) to detect and remove other possible outliers, whereby local adjustment was performed by the LOESS procedure of SAS/STAT® (SAS Institute Inc. 2004a) with a smoothing factor of 0.25. Using this approach, the number of extreme values accounted for 2.83 % of total taper measurements. A small percentage of the extreme data points were the result of errors in measuring bole sections or in the transcription of field notes, but most were the result of measurements in sections where the tree was deformed due to abnormal growth or damage caused by the presence of chancre (Cryphonectria parasitica (Murr.) Barr.). Since taper functions are not intended for deformed stems, these data points were excluded from further analysis, resulting in a final total number of observations of 3,188, from 190 trees.

Figure 3a, b shows relative height against relative diameter together with the LOESS regression curve, the upper graphic showing all the collected data and that below, the data excluding outliers, respectively. Summary statistics of the final data used in this study for tree and stand variables, together with model validation data, are shown in Table 1.

Fig. 3
figure 3

Data points of relative diameter and relative height plotted with a local regression LOESS smoothing curve (smoothing factor = 0.25) for all data (a) and after the elimination of outliers (b)

2.2 Equations tested

We analyzed a total of five models, which are described below and whose expressions are shown in Table 2:

  • Fang et al. (2000). Compatible system formed by a taper function, a total volume equation, and a merchantable volume equation. The taper equation is segmented with two attachment points and three form factors, one for each segment.

  • Bi (2000). Non-compatible variable-exponent taper function.

  • Kozak (2004). Non-compatible variable-exponent taper function.

  • Demaerschalk (1972). Power function whose main advantage is that the volume equations obtained by integrating are algebraically compatible with classic taper functions.

  • Thomas and Parresol (1991). Trigonometric compatible model.

Table 2 Fitted taper equations and their corresponding mathematical expression

2.3 Model fitting and selection

The models tested were fitted by non-linear regression with the MODEL procedure of SAS/ETS® (SAS Institute Inc. 2004b) using generalized least squares for non-linear models.

Of the different options to estimate the parameters in the systems where the taper equation includes a total volume equation (Fang and Bailey 1999; Fang et al. 2000; Goulding and Murray 1976), in this study we prioritized the taper function, setting this first and subsequently performing the predicted volume calculation from the estimation parameters obtained.

To avoid problems in the estimation of the parameters, a value of 0.001 was assigned to the final diameter of the top section. Similarly, a value of 0.001 was also subtracted from the heights equal to the total height, that is h = H − 0.001; values which are lower than the appreciation limit are used in the data collection. This approach allows the use of the entire data set for fitting and does not significantly change parameter estimates (Diéguez-Aranda et al. 2006).

There are several problems associated with stem taper and volume equation analyses that violate the fundamental least squares assumption of independence and equal distribution of errors with zero mean and constant variance. One of the most common is the presence of autocorrelation in the data as a result of working with multiple observations on each tree. To resolve this problem, the error term was modeled using a continuous autoregressive error structure (CAR(x)), which allows the model to be applied to irregularly spaced, unbalanced data (Zimmerman and Nuñez-Antón 2001).

Another problem in taper functions is multicollinearity, which refers to the existence of high intercorrelations among the independent variables in multiple linear or non-linear regression analyses. To evaluate the presence of multicollinearity, we used the condition number (CN). According to Belsey (1991), if the condition number is between 5 and 10, collinearity is not a major problem; if it is in the range of 30–100, then there are problems associated with collinearity; and if it is in the range of 1,000–3,000, the problems are severe.

The criteria used for the comparison of the models were based on the residual plot analysis and statistical analysis of the goodness-of-fit statistics: adjusted coefficient of determination (R 2 adj), root mean square error (RMSE), and Akaike’s information criterion in differences (AICd).

Although the goodness-of-fit statistics reflect the behavior of the data for the different models evaluated, they may not indicate which model is the best for practical purposes (Diéguez-Aranda et al. 2006); hence, this decision should be made after analyzing each model’s behavior according to the different stem sections. To evaluate this, the bias and the root mean square error were calculated and plotted for diameter estimation by relative height classes (intervals of 15 %) and for height estimation by diameter classes (intervals of 5 cm). To estimate the height at which the different diameters are achieved, the iterative bisection method was used.

2.4 Model validation

Quality of fit does not necessarily reflect the quality of future prediction (Myers 1990). Only validation with an independent data set enables the accuracy of the selected model to be known (Huang et al. 2003; Kozak and Kozak 2003). In this study, the validation process was carried out with an independent data set consisting of 70 trees (from a network of plots established by the Atlantic Forest Systems Research Group (GIS-Forest), Department of Organism and Systems Biology, University of Oviedo), which produced a total of 719 height/diameter data pairs. Trees were felled and destructively sampled following the same methodology as used for the fitting data set. Two validation statistics were calculated to assess the overall prediction performance of the fitted equations on this validation data set: (a) an estimate of the average prediction error (APE) (Eq. 1) (Weisberg 1985) and (b) mean bias (Eq. 2) estimated as an overall average and summarized by diameter class, similar to that used by Zhang (1997). Both statistics present errors in the same units as the variable used, in this case centimeter for diameters and cubic meter for volumes. The APE statistic in the validation process is similar to the RMSE in the fitting.

$$ \mathrm{APE}=\sqrt{\frac{{\displaystyle {\sum}_{i=1}^n}{\left({Y}_i-{\hat{Y}}_i\right)}^2}{n}} $$
(1)
$$ \overline{\mathrm{Bias}}=\frac{{\displaystyle {\sum}_{i=1}^n}\left({Y}_i-{\widehat{Y}}_i\right)}{n} $$
(2)

where Y i is the observed or real value, \( {\widehat{Y}}_i \) is the estimated value with the model, and n is the sample size of the validation data.

To examine the performance of the models in greater detail, the values of \( \overline{\mathrm{Bias}} \) were plotted against diameter and total volume. These graphs are of interest since they illustrate areas in which the adjusted models provide poor or good predictions according to the diameter class of the evaluated trees.

3 Results

Table 3 shows the parameters for the taper functions fitted, all of which were significant at the 5 % level, except for the Bi (2000) model, where convergence was not achieved. The model of Kozak (2004) was modified by removing the b 4 parameter in order to adapt it to local and species conditions (Kozak 2004).

Table 3 Parameter estimates (approximated standard error in parentheses) for the models analyzed

All models performed well, each explaining more than 95 % of the total variability, with mean error below 2.05 cm (Table 4). Comparison of goodness-of-fit statistics indicates that the best-fitting models are those of Kozak (2004) and Fang et al. (2000), which each explaining more than 98 % of the total variability. In both cases, the presence of multicollinearity was observed (CN around 62) but it was considered to be within acceptable limits.

Table 4 Goodness-of-fit statistics and condition number of the taper functions evaluated

A trend in the residuals depending on the distance and the relative position of the measurement along the stem was found in the model fitting. Therefore, autocorrelation was corrected applying a second-order autoregressive structure (because using a first-order structure proved to be insufficient) with the aim of obtaining unbiased and efficient estimates, which did not invalidate statistical tests. Following this correction, the trends in residuals virtually disappeared. Figure 4 provides an example using the model of Fang et al. (2000).

Fig. 4
figure 4

Residuals against: Lag1-residuals (left column), Lag2-residuals (middle column), and Lag3-residuals (right column) for the model of Fang et al. (2000) fitted without considering the autocorrelation parameters (first row) and using continuous time autoregressive error structures of first and second order (second and third rows, respectively)

Statistics are good indicators of the global performance of the taper function, but alone, they do not allow the best model to be selected. To do this, the evolution of bias and mean square root error in diameter estimation by relative height classes at intervals of 20 % (Fig. 5a, b) and in height estimation by diameter class (Fig. 6a, b) was analyzed for the two best-fit models, Fang et al. (2000) and Kozak (2004).

Fig. 5
figure 5

Bias (a) and root mean square error (RMSE) evolution (b) for diameter (cm) by relative height class (percentages) for the compatible system of Fang et al. (2000) and the non-compatible taper function of Kozak (2004)

Fig. 6
figure 6

Bias (a) and root mean square error (RMSE) evolution (b) for height (m) by relative diameter class (DBH class) for the compatible system of Fang et al. (2000) and the non-compatible taper function of Kozak (2004)

Graphical analysis of the bias in predicting diameters (Fig. 5a) confirmed the good performance of both models (with bias under ±0.1), with a certain advantage seen for the model of Fang et al. (2000), which showed lower bias at different heights, especially in the lower part of the stem (that with the highest merchantable value). In relation to the evolution of RMSE in predicting diameters (Fig. 5b), both models were very similar, although the model of Fang et al. (2000) was slightly better.

With regard to the evaluation of bias in predicting heights (Fig. 6a), the model of Fang et al. (2000) showed lower bias until diameter class 35, at which point the model of Kozak (2004) performed better, although the model of Fang et al. (2000) again showed the best fit at class 45. For both models, however, there was bias, although up to and including diameter class 25 it was always below 0.3 cm, and for the classes above this, always less than 0.4 cm. The behavior of both models in terms of RMSE was very similar (Fig. 6b).

Taking into account the results and in particular the practical utility of the compatibility between the classic two inputs volume equation and the taper function, the model of Fang et al. (2000) was selected as the most appropriate for chestnut coppice stands in northwest Spain.

The plotting of values from predicted diameter in the selected taper function against the residuals is shown in Fig. 7, where no systematic trend in the distribution of residuals was observed. Figure 8 shows, as an example, the profile of three trees—one small (d = 11.15 cm and h = 12.19 m), one medium sized (d = 24.5 cm and h = 18 m), and one large (d = 36 cm and h = 24.12 m) generated from the observed values (solid lines) and predicted values (dashed lines) for the model of Fang et al. (2000). Figure 9 shows predicted values of total volume for the selected taper function against the observed volume values, verifying the accuracy of the estimates (accounting for 98.38 % of the total variability).

Fig. 7
figure 7

Plot of residuals against predicted diameter from the taper function proposed by Fang et al. (2000)

Fig. 8
figure 8

Observed (solid line) and predicted (dashed line) profiles of three trees (as examples) using the taper function of Fang et al. (2000)

Fig. 9
figure 9

Plot of predicted values against observed values for total tree volume from the taper function proposed by Fang et al. (2000)

3.1 Model validation

Table 5 shows the statistics used in model validation, calculated for different diameter classes. APE generally increased with diameter class in the trees evaluated for the variable diameter and volume and provided good results (average prediction error of 2.14 cm for diameter and 0.059 m3 for volume).

Table 5 APE and \( \overline{\mathrm{Bias}} \) statistics generated from the assessment of prediction error for the taper function fitted in the validation process

The graphs of mean prediction bias are shown in Fig. 10. All values obtained, in the case of both diameter and volume, were similar and close to zero, indicating that the selected equation fits well with the real profile of the tree. Up to diameter class 25, the statistics were very close to zero, although, it is important to note that from diameter class 35, the \( \overline{\mathrm{Bias}} \) values in predicting diameter were far from zero. \( \overline{\mathrm{Bias}} \) values indicate that both diameter and total tree volume are overestimated (negative values).

Fig. 10
figure 10

Plot of DBH class against mean prediction bias for diameter (left) and total volume (right)

4 Discussion

Currently, detailed information is available as regards the different functions and methodologies for the correct estimation of diameters at different heights and total or merchantable stem volume for different species (e.g., Barrio et al. 2007; Diéguez-Aranda et al. 2006). However, no such tools are yet available for chestnut, neither for high forest nor for coppice stands, hence the relevance of this work, which facilitates a better understanding and management of the species.

The final selected model explained more than 98.4 % of total variability and had mean errors below 1.20 cm. The estimates obtained in the models analyzed were similar to those obtained for other species. The model of Fang et al. (2000) has shown good performance, as much for broadleaf species as for conifers (e.g., Barrio-Anta et al. 2007; Diéguez-Aranda et al. 2006; Pompa-García et al. 2009).

Significant variability in chestnut stem profiles occurs in this study due to the high number of stems (up to eight) which were growing from each stool. Previous studies (e.g., Muhairwe 1994) have already demonstrated that factors such as site index, size and position of the crown, and stand density affect the profile of the tree. Modeling the profile of chestnut, in particular in coppice stands, presents an additional difficulty. Due to the fact that often many stems come from the same stool, it seems logical that stool density (number of stems per stool) as well as stand density might also be a key factor because internal competition affects the profile of the tree. Despite this, the selected model explained over 98 % of total variability, above the values obtained in previous broadleaf studies (Barrio-Anta et al. 2007; Pompa-García et al. 2009). Moreover, as the bias values in predicting diameters show, the results perform well in relation to the basal part of the tree, thereby solving one of the main problems associated with the use of taper functions in trees with prominent basal zones.

Validation with an independent data set confirmed the applicability of the selected taper function and the compatible volume equation for chestnut coppice stands in northwest Spain. Both statistics, APE and \( \overline{\mathrm{Bias}} \), increased with diameter class in the trees evaluated. \( \overline{\mathrm{Bias}} \) values did not vary greatly until diameter class 35, after which range slightly increased. This can be attributed in part to a relatively lower number of sampled trees in this diameter class, that is, 4 trees from a total of 70 in the whole validation data set.

5 Conclusions

A taper function for chestnut coppice stands in northwest Spain was developed to estimate diameter at any point along the stem, along with a total volume equation compatible with the fitted taper function. A total of five models were evaluated: the segmented model of Fang et al. (2000), the variable exponent functions proposed by Bi (2000) and Kozak (2004), the power function proposed by Demaerschalk (1972), and the trigonometric compatible model proposed by Thomas and Parresol (1991). In the end, the Bi (2000) model was not compared to the other models because convergence was not achieved in this case. All the other functions analyzed had good performance in estimating diameter along the stem, all of them appropriately describing the stem profile for chestnut coppice stands.

The compatible system to estimate volume proposed by Fang et al. (2000) was finally selected as the best taper function to explain the profile of chestnut coppice, as much for its goodness-of-fit statistics (R 2 adj of 0.98 and mean error of 1.19 cm) as for its prediction ability for diameter and height along the stem. This system has the advantage of being formed by a taper function, a total volume equation, and a merchantable volume equation, all of which are compatible between themselves.

Validation using an independent data set reflected the quality of predictions and confirmed the ability of the selected taper function to describe the stem profile in chestnut coppice stands in northwest Spain.

The taper function finally selected could be used for coppice stands in the rest of the country or elsewhere in the first instance, until new adjusted taper functions are developed to ensure the most accurate estimations possible for specific areas.