Introduction

Dominant height, which is one of the important forest stand parameters, is widely used in forest management. Site index maps for the period of 1963–1972, prepared by dominant height data obtained from ground measurements, are mainly used in forest management planning and the other forestry activities such as silvicultural treatments, soil, and water management in Turkey. However, boundaries of the forest stands have been changing continuously as site index boundaries are altered by silvicultural activities and by climate- and vegetation-related disturbances. Thus, after the several plan periods, the site index map of the related forest area shows inconsistency at certain rates according to the first map (Gunlu 2009).

Forest management decisions including silvicultural prescriptions and afforestation activities mainly depend on ground information to formulate appropriate management of forest ecosystems (Altun et al. 2008). Predicting forest productivity is important for forest management decisions (Carmean 1975). The inadequate productivity assessment is one of the major problems of Turkey’s forestry. Two different methods are used to determine the productivity of forest sites in Turkey. These methods are direct and indirect methods. The direct method is based on ground observations of soil properties, topographic factors (slope, altitude, and landform), and climate; and the indirect method depends on the age and dominant height relations. In Turkey, the indirect method is mainly used in determining of the productivity of forest sites, which is one of the key parameters used in the preparation of forest management plans. This method is generally used to determine forest site productivity in the even-aged forests for practical applications (Altun et al. 2008; Diéguez-Aranda et al. 2006). However, this method is improper to determine the forest productivity in the degraded and untreated forest areas, which cover almost half of forest areas in Turkey. As the target trees (co-dominant and dominant) in degraded forest areas have been cut down either with forest management plan or irregular disturbances, it is difficult to find appropriate trees to determine forest site productivity with indirect method (Altun et al. 2008). It is necessary to use the direct methods in determining the productivity of forests correctly in Turkey’s forests, particularly in degraded and open areas (treeless land). However, use of these methods in large scales is tedious, time consuming, and expensive. Therefore, there are limited number of studies on this subject in Turkey (Altun et al. 2008; Günlü et al. 2009).

Statistical modeling and spatial interpolation methods are widely used in studies carried out to predict data on forest productivity (Palmer et al. 2009; Nothdurft et al. 2012; Mohamed et al. 2014; Raimundo et al. 2017; Parresol et al. 2017; Socha et al. 2017; Scolforo et al. 2017; Vieira et al. 2018). Spatial surface data play an important role in environmental management. Planners or researchers often need these data. Spatial interpolation methods are effective for estimating spatially continuous data and can be used for predicting environmental variables in unsampled locations (Li and Heap 2008; Bostan 2017). The spatial interpolation methods can be classified in different groups comprising non-geostatistical methods such as inverse distance squared (Xia et al. 2017; Loghmari et al. 2018); geostatistical methods such as ordinary kriging and co-kriging (Li and Heap 2008; Li et al. 2011; Meng et al. 2013; Bostan 2017; Göl et al. 2017); statistical methods such as multiple regression analysis (MLR) and generalized regression model (Thistlethwaite et al. 2017; Zhang et al. 2018; Bergier et al. 2018); machine learning methods such as regression tree, random forest, and support vector machines (Li and Heap 2008; Li et al. 2011); and combined methods such as multiple regression kriging (MLRK), multilayer perceptron kriging (MLPK), and radial basis function kriging (RBFK) (Cellura et al. 2008; Dai et al. 2014; Barni et al. 2016; Scolforo et al. 2016; Emamgholizadeh et al. 2017).

The aims of this study were to (1) compare the performance of different predicting techniques to interpolate site index of the beech forests and (2) create a site index map, which is inevitable for sustainable forest management and planning, using the best performing model.

Materials and methods

Study area

Study area (600 ha) is located on a steep terrain in the north of middle black sea region of Turkey (647,000–650,000 E. 4,629,000–4,632,000 N, UTM ED 50 datum Zone 36) (Fig. 1). Elevations ranges from 500 to 970 m above sea level and slope ranges from 10 to 60%. The inter-annual maximum mean temperature (27.6 °C) occurs in summer and minimum (13.8 °C) in winter. The inter-annual mean precipitation is 677.3 mm. The study area is coverd by unmanaged even-aged pure Oriental beech (Fagus orientalis Lipsky.) stands.

Fig. 1
figure 1

Location of the study area

Field data

Seventy sample plots during August of 2005 with 300 × 300 m interval were established in the study area. The coordinates of each sample plot were measured by a hand GPS. The necessary ground measurements such as age and diameter at breast height and dominant height were made in each sample plot. A soil profile was open to the bedrock or to a minimum depth of 1 m at each sampling plot. All soil profiles were identified and classified. Approximately, 1 kg of rock free soil was taken from each horizon in each soil profile.

The soil samples were taken to a nearby laboratory, air dried, sieved through a 2 mm-mesh-sized screen, and stored in vaporproof plastic bags until their analysis. Soil texture was determined by mechanical analysis (Arp 1999). Thickness of the horizons, physiological soil depth, and stone content were recorded during the field survey. Stand age and height were measured on free-growing dominant and co-dominant trees (100 dominant and codominant highest trees per hectare, for example, 6 highest trees in a 0.06 ha plot) at each sample plot.

Site index values at the reference age of 100 years for Oriental beech stands were predicted by using site index curves developed by Carus (1998) (Table 1). The slope, aspect, and elevation of each sample area were determined using digital elevation model (DEM) created by using the contour line map with 10-m intervals digitized from digital topographic maps with 3D modeling in Geographic Information Systems (GIS). The aspect and slope maps were generated using the contour lines, and elevation, aspect, and slope values for each sample area were obtained by 3D modeling in GIS (Günlü et al. 2008).

Table 1 Site index classes of Carus (1998) for oriental beech forest tree species used in the study

Multiple linear regression analysis

Multiple linear regression models, developed using MLR, were used to identify significant parameters to model the dominant height. The stand variables of crown closure; age; soil variables of sand, dust, and clay; plant available water content; pH; organic matter content; and topographic variables of slope, aspect, and altitude were tested. The relationship between dependent (Y) and independent variables in MLR technique is given by (Eq. 1):

$$ Y={\beta}_0+{\beta}_1.{X}_1+{\beta}_2.{X}_2+\dots +{\beta}_n.{X}_n+\varepsilon $$
(1)

where, βi are the model coefficients, Xi is the independent variables and ε is the additive error term. MLR method was performed in SPSS 20.0.

Regression kriging

MLRK is a combination of MLR and ordinary kriging (OK) (Hengl et al. 2007). MLR generates the relationships between primary and secondary variables. It is used for optimizing explanatory variation. In OK, the weights are produced depending on the minimum error variance and the spatial autocorrelation structure. Therefore, it is used to minimize the variance of residuals. The MLRK is a robust spatial interpolation technique, and it is commonly used for interpolating environmental variables (Hengl et al. 2007; Meng et al. 2013; Barni et al. 2016; Scolforo et al. 2016; Bostan 2017).

MLRK application was performed in three stages: (i) modeling the relationship between dependent and independent variables with MLR and obtaining the residuals; (ii) producing residual surface map (raster data) for the study area by using OK; and (iii) spatial overlay of residuals from OK interpolation and MLR predictions. Geostatistical analyses, mapping, and spatial overlaying were performed with ArcGIS 10.3.1 software (Hengl et al. 2007; Barni et al. 2016; Scolforo et al. 2016).

Multilayer perceptron and radial basis function methods

Besides the methods of MLR and MLRK, the artificial neural network models were trained to predict the dominant height. Then, the residual values obtained from MLP and RBF methods were used to run the MLPK and RBFK. Training, verification, and testing process were included into neural network model building 75, 15, and 10% of all data, respectively. The target variable was the dominant height, and the input variables were the significant independent variables of stand age, aspect, and sand, which were selected by the stepwise variable selection technique in MLR analysis. These MLP and RBF models were trained by STATISTICA® software (Statsoft 2007).

Multilayer perceptron kriging and radial basis function kriging methods

MLPK and RBFK are hybrid methods (Cellura et al. 2008; Emamgholizadeh et al. 2017). These methods have been implemented by integrating the OK to the MLP and RBF (Demyanov et al. 1998; Demyanov et al. 2001; Cellura et al. 2008; Dai et al. 2014; Emamgholizadeh et al. 2017). MLPK and RBFK applications were conducted in three stages: (i) applying MLP and RBF methods by using target and input variables obtained from MLR; (ii) interpolating residuals from MLP and RBF by OK; and (iii) building maps of MLPK and RBFK interpolated values of dominant height (Cellura et al. 2008; Emamgholizadeh et al. 2017). ArcGIS 10.3.1 software was used to map the interpolation results. The flowchart for modeling and mapping of the dominant height is given in Fig. 2.

Fig. 2
figure 2

Flowchart of combine methods of MLRK, MLPK, and RBFK

Semivariogram analysis

Modeling the semivariogram (or simply variogram) is one of the fundamentals of geostatistical analysis. The most suitable model, which is the lowest residual sum of squares and the greatest coefficient of determination, was selected. Semivariogram models can be described by its parameters such as sill, range, and nugget. Sill is a semivariance value at range, and range is the lag distance, which the semivaogram reaches its maximum. Autocorrelation is most probably zero beyond this distance. Nugget is the semivariance value at which semivariogram intersects y-axis. The nugget value in theory should be zero, while it is generally different from zero, in practice, due to several reasons such as errors arising in measurements of the target variable, the short-range variation that may not be accounted by the minimum between-samples distance applied in current sampling scheme, and allowable errors in GPS accuracy (Isaaks and Srivastava 2001; Bohling 2005; Kristensen et al. 2015).

Nugget effect, which represents variance in small distances and variance of measurement error, is an important indicator for strength of spatial dependency. According to Cambardella et al. (1994), the spatial dependency strength is determined with spatial correlation index (SCI), and it is calculated by the ratio of nugget to sill (Eq. 2). If the SCI is < 25%, the target variable is deemed strongly spatially dependent, between 25 and 75% moderately spatially dependent and > 75% weakly spatially dependent. If this value is between 25 and 75%, the varibles is deemed moderately spatial varible and (Cambardella et al. 1994).

$$ \mathrm{SCI}=\left(\frac{\mathrm{Nugget}}{\mathrm{Sill}}\right)\times 100 $$
(2)

We model the spatial structure of dominant height by semvariograms and used the resultant semivariogram parametes of sill, nugget, and range in OK interpolations of dominant height. We calculated SCI to identify spatial dependency strength of the dominant height. All calculations were performed in ArcGIS 10.3.1 software.

Evaluation criteria

The performance of models were evaluated by Root Mean Squared Error (RMSE, smaller is better), Akaike Information Criterion (AIC, smaller is better), and coefficient of determination (R2, higher is better), which were calculated as follows:

$$ \mathrm{RMSE}=\sqrt{\frac{1}{n}{\sum}_{j=1}^n{\left({z}_o-{z}_p\right)}^2} $$
(3)
$$ \mathrm{AIC}=n.\ln \left(\frac{\sum_{j=1}^n{\left( Zo- Zp\right)}^2}{n}\right)+2k $$
(4)
$$ {R}^2=\frac{\sum_{j=1}^n{\left( Zo- Zp\right)}^2}{\sum_{j=1}^n{\left( Zo- Zm\right)}^2} $$
(5)

where Zo is the observed value, Zp is the predicted value, Zm is the mean of observed values, n is the number of observations, and k is the number of regression coefficients.

Results

Descriptive statistics of data sets were given in Table 2. The dominant heights varied between 13.25 and 35.50 m, and mean dominant height of the stands was 25.90 m. The ages of the sampling stands ranged from 32 to 169 and mean age was 96, which was consisted with overmature pure beech stands. There were no young stands in the 0–20 age group in study area. The aspect had the highest coefficient of variation (Cv% = 75.80), and mean sand content was 42.97%. Sand content, age, and aspect occurred as significant predictors of dominant height. The results of MLR analysis are given in Table 3, in which dominant height is dependent variable and age, aspect, and sand are independent variables.

Table 2 Descriptive statistics
Table 3 Regression model predicting dominant height from aspect, age, and sand content variables

The models and parameters of the best fitted variograms are shown in Fig. 3 and Table 4, respectively. Model with the lowest error was selected in the variogram modeling. Exponential model was selected for interpolating dominant height with MLRK method. The dominant height was moderately spatially dependent (SCI = 55%) and had a geostatistical range of 665.9 m. Spherical model was fitted for MLP and RBF. The MLP was moderately spatially and RBF was weakly spatially dependent (Table 4) and geostatistical range for RBF was far greater than that for MLP. The results showed that dominant height was autocorrelated for these methods using in the study. These methods indicated moderate and weak spatial dependency, and range values were less than the longest distance between two sample areas.

Fig. 3
figure 3

Experimental (circles) and theoretical (lines) semivariograms for residuals (i) MLR, (ii) MLP, and (iii) RBF. (i) MLR—γ(h): 17.207 × nugget + 31.384 × exponential (665.9); (ii) MLP—γ(h): 2.068 × nugget + 6.924 × spherical (270.9); (iii) RBF—γ(h): 6.352 × nugget + 7.980 × spherical (1506.6)

Table 4 Semivariogram parameters of dominant height obtained from residuals of MLR, MLP, and RBF methods

Evaluation criteria (AIC, RMSE, R2, and r) for the modeling techniques are presented in Table 5. Combined methods such as MLRK, MLPK, and RBFK outperformed MLR, MLP, and RBF for predicting the dominant height, and RBFK was the most successful model (R2 = 0.98). Combining the MLR with OK substantially improved the modeling performance, as indicated by R2 increased from 0.23 for MLR to 0.96 for MLRK.

Table 5 Evaluation criteria of models

MLR yielded poor predcitions (R2 = 0.23) as shown by the relationship between predicted and observed values highly scatters around the 1:1 line, while the RBFK performed the best (R2 = 0.98) as also shown by tidily coalescence of observed and RBFK predicted data around the 1:1 line (Fig. 4). The distributions of residuals versus the predicted dominant height showed that there was no trend in a particular direction (Fig. 5). MLR had a higher variance, which were resulted from biased predictions when compared with the other methods. The effect of residuals on the predictions has been minimized, and unbiased estimates were obtained by MLRK and RBFK.

Fig. 4
figure 4

The relationships between estimated and observed dominant height values

Fig. 5
figure 5

Residual distribution versus estimated dominant height

MLPK- and RBFK-interpolated dominant height values were similar in spatial pattern (Fig. 6). The borders of site index classes derived from MLRK resemble more to natural borders compared with those derived from the other methods.

Fig. 6
figure 6

Site index maps obtained from MLRK, MLPK, and RBFK methods

Discussion

The main objective of this study was to develop a consistent site index map by using statistical methods (MLR, MLP, RBF, MLRK, MLPK, and RBFK) for planners and researchers in the presence of limited data. Besides, it is also to evaluate the performance of various modeling techniques to predict the site index of Oriental beech, which is an important tree species in Turkey.

In this study, we compared performance of the techniques of MLR, MLP, MLPK, and RBFK to predict dominant height for the purpose of developing site index for beech tree species, which is an important forest cover in Turkey. MLR performed the worst and MLPK performed the best in predicting the dominant height in the studied beech stands. There are many studies, which have used the MLR method for prediction of the site index. Palmer et al. (2012) developed a multiple regression model of Sequoia sempervirens tree species site index using independent variables. Their final model formulated using mean annual daily temperature and mean summer vapor pressure deficit accounted for 82% of the variance in site index. These two variables were highly significant (P < 0.01), with partial R2 values of 0.71 and 0.11, respectively. Lumbres et al. (2018) used the MLR to develop a height-age model for Acacia mangium and Eucalyptus pellita tree species, and they found good age–height relationship for both of the Acacia mangium (R2 = 0.90) and Eucalyptus pellita (R2 = 0.80).

In addition to the MLR method, there are also studies using the kriging method for estimating and mapping the site index. Hock et al. (1993) used GIS and geostatistics to estimate site index of Pinus radiate tree species. The correlation between observed and estimated values provided an r of 0.63. Raimundo et al. (2017) developed a dominant height model for Eucalyptus plantation forest with 5 years data using kriging method. They observed low correlation values for the dominant height in the ages of 2.1 years and good correlation between 3.1 and 5.8 ages.

The MLRK outperformed MLR as evidenced by the very high R2 of 0.96 for MLRK versus 0.23 for MLR. The degree of spatial dependency of residuals is an important factor affecting success in regression kriging. In our study, even use of moderately spatially dependent (SCI = 55%) MLR residuals in MLRK resulted in a considerable improvement in the prediction quality for site index. Similar results have been reported in many studies. For example, Palmer et al. (2009, 2010) reported R2 values of 0.59 and 0.70 for MLR and MLRK, respectively, for site index of Pinus raditate in New Zealand. Aertsen et al. (2012) found that RMSE values of MLR and MLRK were 2.58 for Beech tree species and 3.43 for Oak tree species in Belgium. Kimberley et al. (2017) used MLR and MLRK to estimate the site index, and they found R2 values of 0.63 and 0.82 MLR and MLRK, respectively.

We compared MLR with RBF and MLP, sorts of artificial neural networks, to predict site index of beech, and we found that both of RBF and MLP outperformed MLR (R2 = 0.72 for RBF, 0.69 for MLP, and 0.23 for MLR). Similar results have been reported elsewhere, for example, Vieira et al. (2018) found that the R2 values of MLR models were between 0.48 and 0.89, while for ANN and ANFIS were 0.96 and 0.95, respectively, in predicting dominant height of eucalyptus plantation. Aertsen et al. (2010) estimated the site index in Pinus brutia, Pinus nigra, and Cedrus libani stands. Aertsen et al. (2011) predicted the site index in Quercus robur, Pinus sylvestris, and Fagus orientalis stands. MLR and ANN methods were compared in both studies. They found that ANN methods outperformed the MLR in all the cases. Wang et al. (2005) used ANN for mapping site index of mature stands of lodgepole pine in Alberta, Canada, and found that ANN yielded the best result based on R2.

In our study, MLPK and RBFK methods were applied by making kriging combination to artificial neural network methods such as MLP and RBF. The R2 values of 0.69 and 0.72 obtained by MLP and RBF were obtained as 0.81 and 0.98 with MLPK and RBFK, respectively. Thus, an increase 0.12 and 0.26 in R2 values was obtained with the combined methods. In general, the combined use of MLR and ANN methods with the kriging method increased the success. As a basis for this success, the kriging method appeared to reduce the variance of errors. Thus, the methods with a certain proportion of errors had become more successful with the combination of the kriging method.

Conclusions

In this study, the success of the MLR, MLP, RBF, MLRK, MLPK, and RBFK techniques was compared to predict the site index. The results of modeling indicated that significant improvement has been accomplished through the MLR and OK combination. However, it was best predicted by RBFK model in the study. The results obtained from this study imply that forest managers could use combined methods for better estimation of the site index. Therefore, in order to increase the degree of success of this study, it should be applied to local studies in different forest ecosystems using combined techniques.