Introduction

Chinese fir (Cunninghamia lanceolata (Lamb.) Hook) is a characteristic species of the subtropical zone in southern China, and an important reforestation and commercial tree species. According to the eighth Chinese National Forest Inventory, Chinese fir plantations occupy approximately 8.95 million ha, and have a standing timber volume of 625 million m3 (SFA 2013).

Basal area (BA) is an important stand variable in forest surveys and directly related to other important economic variables such as stand volume and quadratic mean diameter. Many management and silvicultural considerations, for example, thinning intensities, are based on measurements of basal area. In addition, curves of mean basal area are useful tools for effective management of stands as they help to estimate the timing of intermediate and final cuts (Assmann 1970; Sun et al. 2007).Therefore, basal area growth models have traditionally been one of the primary models in forest growth and yield prediction systems. Over the past several decades, a number of individual-tree or stand-level basal area growth models have been developed for a variety of tree species in pure or mixed forests (Li et al. 1988; Monserud and Sterba 1996; Schroder et al. 2002; Mohammadi et al. 2017; Lamb et al. 2018). These often used ordinary least squares (OLS) regression analysis to establish an empirical relationship between basal areas and stand age under different conditions. But many of the assumptions necessary for traditional OLS regression are violated by using time series data that characteristically exhibit non-normality, and thus tend to suffer multicollinearity and autocorrelation (Clutter et al. 1983; Li et al. 1988; Liu and Zhang 2005).

Artificial neural networks (ANNs) are loosely modeled on brain function: a series of nodes representing inputs, outputs, and internal variables are connected by synapses of varying strength and connectivity (Jensen et al. 1999). ANNs require no assumptions for OLS regression about the normality and independence of study data. Instead, the network is trained to find underlying relationships between input and output. During the last two decades, ANNs have received considerable attention as a valid alternative to traditional statistical methods to predict behavior of non-linear systems (Gianfranco et al. 2007; Dande and Samant 2018), and have been used to predict forest biomass (Foody et al. 2003; Henry et al. 2013), basal area and stem density (Corne et al. 2004), bark volume (Diamantopoulou 2005; Diamantopoulou and Milios 2010; Ashraf et al. 2013; Wu and Ji 2015).

The back-propagation artificial neural network (BPANN) is a popular model for the application of artificial intelligence (Wu and Ji 2015). Vapnik et al. (1997) proposed a support vector machine (SVM) which also belongs to a feedforward neural network. In recent years, a number of nonlinear classification and regression SVMs have been developed and these have been benchmarked against an ANN. SVMs have been successful in some areas such as: time series prediction (Sapankevych and Sankar 2009); intrusion detection (Hong 2012); and, surface ozone (Alkasassbeh 2013).

The focus of this study was to determine accurate estimates of stand basal areas of Chinese fir plantations. ANN models were used as an alternative to the traditional generalized nonlinear regression approach. Data for the planting density trial were from permanent plots nearly 30-years-old. The objective was to analyze the abilities of a back-propagation artificial neural network (BPANN) and SVM to describe the stand basal area dynamics of Chinese fir plantations under different growing conditions.

Materials and methods

Study site and experimental design

Permanent plots of Chinese fir plantations, located in Dagangshan Mountain (27°34′N,114°33′E), Jiangxi Province, southern China (Fig. 1), were established in the spring of 1981 with bare-root seedlings. The soil is a red and yellow loam. Elevations are from 250 to 300 m with slopes < 45%. The average frost-free growing season is 265 days and average annual precipitation is estimated to be 1591 mm.

Fig. 1
figure 1

Location of the area of study

Plots were installed in a random block arrangement with spacings of 2 m × 3 m (A), 2 m × 1.5 m (B), 2 m × 1 m (C), 1 m × 1.5 m (D), and 1 m × 1 m (E). Each spacing was replicated three times for a total of fifteen plots (Table 1). Each plot was 20 m × 30 m (0.06 ha), with a buffer zone of two rows of the same species density around each plot, and a fixed boundary of concrete piles. Seedling mortality surveys were carried out annually during the first 2 years; seedlings that died were replaced to ensure spacing was maintained.

Table 1 Descriptive statistics of plots

Tree measurements

All trees in each plot were numbered. For the first 10 years, the plots were measured once every year, and then every other year. Height is often viewed to be independent of stand density, and can thus be used as an indicator of site productivity. For calculating site index, we selected two dominant trees from the upper, middle and lower areas of the plot from 6-year-old trees. The arithmetic mean height of six dominant trees at the reference age of 20 years for every plot was calculated and viewed as its site index. Diameters outside bark at breast height (DBH) were measured for all numbered trees which reached 1.3 m. Total heights were measured on a systematic sample of 50 trees per plot. The under-branch height (crown base height) was also measured for these trees.

All data from repeated censuses in undisturbed stands were used for simulations. In 2008, the plots had been measured 19 times (Table 2). Plot data included age (A), dominant height (H), number of living trees ha−1 (N), basal area ha−1 (G) and quadratic mean diameter (Dg). The latter was derived from the basal area of the average tree in the corresponding plot.

Table 2 Situation of surveyed stands

Neural networks modelling

An ANN consists of connected nodes usually arranged in a multilayer structure. Each connection between nodes in different layers has a weight, and each node is a processing unit that operates on the weighted sum of inputs. The number of nodes in the input corresponds to the number of input variables. In our case, there were four input nodes for the growth factors in the input layer. The output layer contains one node representing basal area (BA). The number of hidden nodes is usually determined by a number of trial-and-error runs. Some studies have shown that there is rarely an advantage to using more than one hidden layer (Rumelhart et al. 1986; Lippmann 1987). For this reason, several supervised feed forward neural networks in this study were trained with one hidden layer containing several hidden nodes (Fig. 2).

Fig. 2
figure 2

The ANN dependency graph for modelling basal area growth of Chinese fir

The principle of ANN has been described in previous studies (Rumelhart et al. 1995; Basheer and Hajmeer 2000). Several computer software packages are available for analyzing artificial neural networks, and we chose Matlab 7.11.0.584 (R2010b) (Demuth and Beale 2009). The software integrates 14 neural network algorithms for the MLP network. Three-layered feed-forward networks with a back-propagation training function (BP) were chosen as a nonlinear regression model(Demuth and Beale 2009). The SVM experiments were conducted using the LibSVM package(Chang and Lin 2011). K-fold cross-validation (K-CV) indicates that the original data is divided into K groups (at average). Each subset is a validation set, and the remaining subset (K-1 group) is used as the training set. The average performance of k models indicates model performance. K-CV can effectively avoid the occurrence of learning and insufficient learning, so the results obtained are more convincing. Performance of the BP and SVM models were also improved by a sixfold cross-validation. A total of 200 samples were selected as a training set at random, and 40 samples were used as a test set to evaluate the performance of the model. All other training parameters were left at their default values.

According to the model selection strategy in artificial neural networks (Egrioglu et al. 2008), the BP model LM451 with five hidden neurons was the most reliable based on WIC criterion. The model indicates that Levenberg–Marquardt (LM) algorithms is composed of one input layer with four input variables, one hidden layer with five nodes and one output layer with one output variable.

Data preparation

Similar to ANN, scaling before applying SVM is important since input data have very different orders of magnitude between attributes. Normalization of data within a uniform range is essential to prevent larger numbers from overriding smaller ones, and to avoid numerical difficulties during the calculation. The training and testing sets were separately scaled to the range [− 1,1] by a linearly scaling formula (Basheer and Hajmeer 2000) as following:

$$ X_{i} = \frac{{2 \times (Z_{i} - Z_{i}^{\hbox{min} } )}}{{Z_{i}^{\hbox{max} } - Z_{i}^{\hbox{min} } }} - 1 $$
(1)

where Xi is the normalized value of Zi, \( Z_{i}^{\hbox{min} } \), and \( Z_{i}^{\hbox{max} } \) are the minimum and maximum values of Zi in the database. Linearly scaling each attribute to the range [− 1, 1] is recommended. To standardize the scales of input and output variables, they were converted over the interval [− 1…− 1] to adapt to the transfer function (Swingler 1996).

To further understand the influence of the input variables on the BP model, a sensitivity analysis was carried out holding three variables constant while letting the fourth vary (Table 3). The sensitivity analysis is based on Olden and Jackson (2002).

Table 3 The input s of sensitivity analysis under the LM451 model

Multiple nonlinear regressions with mixed effects

To compare the BP model with conventional methods to determine basal area (BA), we built a multiple nonlinear mixed-effects model with plot level effects. The frequently used Chapman–Richards equation was adopted, and its mathematical expression is (Li et al. 1988):

$$ BA = b_{1} (SI)^{{b_{2} }} (1 - \exp ( - b_{4} A))^{{b_{3} }} $$
(2)

where b1b4 are parameters, and the others as defined above. S-plus software was used for the regression analysis (Harrell 2001).

Model evaluation

The quantitative evaluation of models is an important part of growth modelling and based on the calculated adjusted coefficient of determinations (\( R_{adj}^{2} \)), mean square error (MSE), residual sum of squares (RSS), standard deviation (SD), and model efficiency (ME). Corresponding forms are as follows:

$$ R_{adj.}^{2} = 1 - \frac{{\frac{1}{n - i - 1}\sum\nolimits_{i = 1}^{n} {(y_{i} - \hat{y}_{i} )^{2} } }}{{\frac{1}{n - 1}\sum\nolimits_{i = 1}^{n} {(y_{i} - \overline{{y_{i} }} )^{2} } }} $$
(3)
$$ MSE = \frac{{\sum\nolimits_{i = 1}^{n} {(y_{i} - \hat{y}_{i} )^{2} } }}{n} $$
(4)
$$ RSS = \sum\limits_{i = 1}^{n} {\left( {y_{i} - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} } \right)^{2} } $$
(5)
$$ SD = \sqrt {\frac{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i} - \bar{y}} \right)^{2} } }}{n - 1}} $$
(6)
$$ ME = 1 - \left[ {\sum\limits_{i = 1}^{n} {(y_{i} - \hat{y}_{i} )^{2} } /\sum\limits_{i = 1}^{n} {(y_{i} - \bar{y}_{i} )^{2} } } \right] $$
(7)

where yi, \( \bar{y} \) and \( \hat{y}_{i} \) are the observed value, the mean value, the corresponding predicted value of sample i from ANN and nonlinear mixed-effects model, respectively. n is the number of samples.

The normality of residual distribution and heteroscedasticity were checked using the Shapiro–Wilk test (Shapiro and Wilk 1965) and the graphs of the residuals against the measured values. To detect dependencies or patterns, observed values may be plotted over predicted values or residuals over measured values (Gadow and Hui 1999). The selected model should have the largest value for the coefficient of determination (R2) after adjusting for degrees of freedom.

Results

Sensitivity analysis of the BP model

LM451, a three-layer network of four input neurons, five hidden neurons and one output neuron, was used for the sensitivity analysis. The predicted basal area values increased slowly with stand age increasing at different density levels, and finally stabilizing (Fig. 3). The stable values presented after 16a, close to the stand harvest age.

Fig. 3
figure 3

Simulated variation curves of basal area with stand age under five density levels

When the density was < 6000 stems ha−1, basal area increased rapidly under different site indices (Fig. 4). With > 6000 stems ha−1, three variation trends were found. A decreasing trend occurred under the condition of two higher site indices, and basal area stabilized or increased under relatively low site conditions.

Fig. 4
figure 4

Simulated variation curves of basal area with stem density under five site indices

The changing trajectory of basal area with the stand Dg (quadratic mean diameter) resembled an S-shaped curve (Fig. 5). Basal area increased with Dg across its entire range, but the magnitude was different. It increased noticeably near the intermediate value of Dg while with lower or higher Dg it often slowly increased.

Fig. 5
figure 5

Simulated variation curves of basal area with quadratic mean diameter under five density levels

A flat response curve indicated that the basal area of the stand remained constant and was not affected by site index value (Fig. 6).

Fig. 6
figure 6

Simulated variation curves of basal area with site index under five density levels

Precision comparison of SVM, BP model and the Chapman–Richards model

Table 4 lists the simulation statistics of SVM, LM451 model (BP) and Chapman–Richards nonlinear mixed-effects model. The SVM model had the highest \( R_{adj}^{2} \) and ME, and the lowest MSE and RSS. Simultaneously, the LM451 model had higher \( R_{adj}^{2} \) and ME, and lower MSE and RSS than the Chapman–Richards model. The SVM had the optimal simulation performance among the three models, and all simulated values were similar to measured values (Table 5). Although the Chapman–Richards equation considered the plot-level mixed effect, its simulation property was inferior to the SVM and LM451 models.

Table 4 Performance comparison among SVM, LM451 model (BP) and Chapman–Richards model
Table 5 Descriptive statistics for SVM

The residuals of the BP (LM451) model show an increasing trend with basal area increasing. The residuals tended towards negative values when basal areas were smaller, and the residuals tended towards positive values when basal areas were larger (Fig. 7). There was no obvious trend for the relationship between residuals of the Chapman–Richards model and the basal area, but when the basal area was large, the residuals had an increasing trend (Fig. 8). The residuals of SVM were randomly distributed and there was no systematic trend (Fig. 9). It was clear that the SVM model had less systematic bias with increasing stand age than the LM451 and Chapman–Richards models.

Fig. 7
figure 7

Residual distribution of BP model (LM451) with basal area

Fig. 8
figure 8

Residual distribution of Chapman–Richards nonlinear mixed-effects model with the basal area

Fig. 9
figure 9

Residual distribution of the SVM model with basal area

As Table 1 shows, the PE increased with increasing density; however, when the cell density was too high, the distance between cell clusters was very small, which increased the difficulty of isolating single-cell clones. Therefore, according to the appropriate PE and the growth state of cells, the initial cell density for inoculation was set to 3 × 103 cells mL−1. Moreover, the use of filter paper in the nursing culture increased the chance of contamination. So conditioned culturing was used for single-cell cloning.

Discussion

A quantitative relationship between independent and dependent variables is required when using nonlinear regressions to estimate model parameters. The selection of mathematical functions and parameter-estimation methods becomes more difficult when modelling non-linear biological growth (Johnston et al. 2010). ANNs have the ability to identify hidden patterns in data (Scrinzi et al. 2007), and may provide a more flexible and relatively simpler approach to modelling complex biological systems compared with deterministic models (Gianfranco et al. 2007). The BPANN model, with the error back-propagation procedure, was affected slightly by data quality problems and bias, and is efficient for quantifying nonlinear relationships (Rumelhart et al. 1986). Studies on total volume and stem profiles have found that the BPANN model performs as well as or better than traditional regression approaches (Diamantopoulou and Milios 2010; Bráulio et al. 2017). Our results show that the BPANN model had higher performance precision than the Chapman–Richards model, which is in agreement with Diamantopoulou and Milios (2010). However, the residual distribution of the BPANN model in this study was not satisfying, and the variation trend was not the same as with a correlated study by Ashraf et al. (2013). This might be related to data structure or to the interior design of BPANN. The residuals of the Chapman–Richards method were uniformly distributed on both sides of the X axis, which means that the mixed-effect model with plot level random effect could help eliminate systematic trends.

The foundations of the support vector machines (SVMs) were developed by Vapnik (1995), and are gaining popularity due to their attractive features and empirical performance. SVM was developed to solve classification problems, but it has been extended to regression problems. The formulation embodies the structural risk minimization principle, which is superior to the traditional empirical risk minimization principle used by conventional neural networks (Gunn et al. 1997). In this study, SVM had greater precision than BPANN and the Chapman–Richards model. Similar results have been reported by Godarzi et al. (2012) in which the SVM method had the highest precision compared to the artificial neural network method and the maximum Likelihood method. SVM is similar to an BPANN since both receive input data and provide output data; the input and output data of SVM are identical to BPANN for regression equations (Alkasassbeh 2013). However, the SVM is primarily better because it is not affected from over fitting like BPANN. Site index (SI), stand density and age were sufficient to predict basal area (Merriam et al. 1995; Malinen et al. 2003). The multiple form of OLS (ordinary least squares) regression, with many independent variables regressed against the dependent variable BA, was generally preferable to the linear form of OLS regression. However, it still could not eliminate multicollinearity and autocorrelation. In our study, BPANN, SVM and the nonlinear mixed-effects models all had the independent variables, including site index (SI), stand density, and age. These models not only presented the complex conditions of basal area growth, but also solved problems brought by study data. These might be the reasons why the three modelling methods proved to have good simulation performances.

Ecological modelling problems frequently combine small datasets, noisy data, and weak domain theories. These problems will present severe challenges to machine learning techniques. Due to the comparative advantages with conventional regression-based approaches, SVM and BPANN have considerable potential in forest growth and yield modelling. However, traditional regression models can be based on few data, and can be more advantageous for stand basal area estimation from a few plots. ANNs modelling often requires large data sets to access the reliable certainty for basal area growth estimations. When selecting a model, it is important to choose one that best represents the observations and that is reasonably easy to solve.

Conclusions

The results showed that SVM had higher precision and less systematic bias than BPANN and the Chapman–Richards nonlinear mixed-effects model. SVM was able to successfully simulate stand basal area of Chinese fir on Dagangshan Mountain using simple input data. The method introduced in this article is sufficient for many forest ecological modelling applications because of its efficiency and accuracy.

Sensitivity analysis of the BP model demonstrated that stand basal area increased with stand age, and stands with different site indices eventually led to different stable values. Site index had a weak effect on the emulational basal area growth. The response curve was almost parallel to the X-coordinate. Ignoring the weak differences in modelling precision, the BPANN, SVM and the nonlinear mixed-effects model all had a potential application for stand basal area growth modelling.