Introduction

Agroforestry systems are the use of agricultural and forestry species in the same area and are becoming common in tropical regions, because they are able to maintain biodiversity levels between natural forests and purely agricultural uses, by increasing connectivity or sustaining biodiversity in fragmented forest landscapes, using concepts of nutrient cycling, increased fertility, and soil moisture, thus increasing crop yields (Ribaski et al. 2001; Haggar et al. 2019).

In these tropical regions, Amazon rainforest has suffered losses due mainly to deforestation, which has already reached 20% of its original area (Aguiar et al. 2016) and climate change, which causes decreasing in forest resilience, affecting its exuberant biodiversity (Hilker et al. 2014) and pose a threat to many endangered flora species (Sakuragui et al. 2013), being agroforestry systems an appropriate way to use of forest products.

Among these endangered species, we highlight the Swietenia macrophylla King (Brazilian mahogany) which is a shade-tolerant species found in the dryland forests of Brazilian Amazon, with low population density and has been widely exploited in recent decades due to the high commercial value of its timber. The intense exploration and of Brazilian mahogany in natural areas have grown significantly over the years (Souza et al. 2008; Rocha et al. 2016; Milagres and Machado 2016) which makes necessary the creation of legislation for management and exploitation of species.

Nowadays, the way management is practiced makes it more difficult to explore the species in natural forests, as they need a better ecological and silvicultural understanding (Free et al. 2017). The diametric variation of Brazilian mahogany in natural forests in Brazil was from 31 to 121 cm and with an increment in the basal area of 63.1 cm2 year−1 (Cunha et al. 2016). In the Chiquibul Forest reserve located in Belize, the diametric variation of species was 30–60 cm, where the increment in diameter over 10 years was increased by approximately 40% with the cut of lianas. In Indonesia, Brazilian mahogany was the species that showed very slow growth in agroforestry systems getting only 40 cm at 40 years, approximately (Sabastian et al. 2018).

For these reasons, it is important to study the development of Brazilian mahogany in integrated systems with other plant and animal species as an alternative way to use the species (Viégas et al. 2012; Sabastian et al. 2018; Silva et al. 2018a), beyond an excellent long-term production strategy (Santos et al. 2019) due current difficulty of exploring mahogany in natural forest areas.

However, the success of Brazilian mahogany cultivation depends on, beyond appropriate silvicultural practices (Silva et al. 2017), accurate methods for estimating the volume of timber available and improving decision making in the management of the species. The volume equations are among the most traditional estimation methods, requiring normality and independence residues for their estimates (Fernandes et al. 2017), which is not suitable for many data that present structures of more complex variances, such as multiple measures in the same individual, spatial and temporal correlation, hierarchical and nested data (Zuur et al. 2009), demanding the use of more robust methods (Binoti et al. 2016).

Among the most robust methods stand out the mixed-effects modeling, which allows to include random effects and variances structures (Pinheiro and Bates 2000) and machine learning techniques such as artificial neural networks, and support vector machines, which present promising and generally more accurate results in the estimation of wood volume when compared to traditional methods (García Nieto et al. 2012; Bhering et al. 2015; Vahedi 2016). That technique makes possible to include variables that are generally not used in traditional regression fits (Binoti et al. 2014ab; Araújo Júnior et al. 2019).

Therefore, this research aimed to develop equations that estimate the commercial volume of Brazilian mahogany trees in agroforest systems using traditional nonlinear approaches, mixed nonlinear modeling, and machine learning techniques, as well as comparing the averages of estimates of these approaches by univariate analysis of variance.

Materials and methods

Study area and data collection

Data on the execution of the study were obtained in the western region of Tomé-Açu municipality, Pará State, Brazil, (02° 29′ 14″ and 02° 30′ 03″ S; 48° 23′ 10″ and 48° 22′ 22″ W), 45 m de altitude (Fig. 1). The climate of the region is Af, rainfall accumulated throughout the year is 2500 mm, without dry season, and the average annual temperature is 26 °C (Alvares et al. 2013). Soil is a dystrophic yellow latosol, according to the Brazilian System of Soil Classification (Santos et al. 2013) which equates to the Xanthic Ferralsol of Food and Agriculture Organization of the United Nations.

Fig. 1
figure 1

Location map of study municipality

Data were measured on Brazilian mahogany trees at 17 years of age, implanted in a 14.65 ha agroforestry system at an 8 × 6 meter spacing. The standing tree was sampled in 36 circular plots of 500 m2, systematically allocated and equidistant in 50 m. The diameter with bark at 1.3 m height (dbhwb), in centimeters, and commercial height (hc), in meters, defined by the first bifurcation, of all trees on the plots were measured and three trees per plot (a total of 108 individuals) were standing scaled to compute the commercial volume by Smalian’s formula (Table 1). These trees represented a diametric distribution of the agroforestry system and ensure sample sufficiency determined by the method proposed by Cochran (1965), considering a sampling error equal to or less than 1%.

Table 1 Descriptive statistics of dendrometric variables

The agroforestry system was implemented with scaling species over time, considering the years since the beginning of project: primarily was cultivated the Piper nigrum (kingdom pepper) species during the first 3 years; then Swietenia macrophylla (Brazilian mahogany) and Coco nucifera (coconut) species were implemented from the second and third year, respectively, and, after 9 years was implanted Theobroma cacao (cocoa) species to benefit itself from the shadow of Brazilian mahogany. Individuals from kingdom pepper, coconut, and cocoa were implemented under the spacings of 2.5 m × 2.5 m, 5 m × 7.5 m and 3 m × 3.5 m, respectively.

Volume modeling

The volumetric model of Schumacher and Hall (1933) and the taper model of Kozak et al. (1969) were fitted by nonlinear regression (Eqs. 12, respectively). The first model was fitted for direct estimation of commercial volume and the second to describe the profile of stem, using the “nls” function of the R 3.4.4 software (R Core Team 2018) and, later, the commercial volume was estimated using a numerical integration process (Eq. 3), with the “function”, “integrate”, and “mapply” functions.

$$\hat{v}_{\text{c}} = \beta_{0} {\text{dbh}}_{\text{wb}}^{{\beta_{1} }} h_{\text{c}}^{{\beta_{2} }} + \varepsilon_{i}$$
(1)
$$d_{\text{cc}} = {\text{dbh}}_{\text{wb}} \sqrt {\beta_{0} + \beta_{1} \left( {\frac{h}{{h_{\text{c}} }}} \right) + \beta_{2} \left( {\frac{h}{{h_{\text{c}} }}} \right)^{2} } + \varepsilon_{i}$$
(2)
$$\hat{v}_{\text{c}} = \mathop \int \limits_{{h_{1} }}^{{h_{\text{c}} }} \left( {\frac{\pi }{{40{,}000}} \left( {{\text{dbh}}_{\text{wb}} \sqrt {\beta_{0} + \beta_{1} \left( {\frac{h}{{h_{\text{c}} }}} \right) + \beta_{2} \left( {\frac{h}{{h_{\text{c}} }}} \right)^{2} } } \right)^{2} } \right)$$
(3)

\(\hat{v}_{\text{c}}\) is the commercial estimated volume (m3); β0, β1, and β2 were the parameters to be estimated; dbhwb is the diameter with bark at 1.3 m height (cm); hc is commercial height (m); ɛi is the random error (m3); h1 is the lower height measured (m); h the height whose dwb was measured (m); and π is the pi value.

A nonlinear mixed-effects model was applied only in Schumacher and Hall model (Eq. 4), whose criterion for selection was the highest precision in the commercial volume estimation. Moreover, its structure with mixed effects for volume estimation is represented in Eq. 5, where the vector “β” represents the fixed effects common to all individuals and the vector “b” is the random effects specific for each individual for believing that it would increase the volume estimates precision

$$\hat{v}_{\text{c}} = \varnothing_{0} {\text{dbh}}_{\text{wb}}^{{\varnothing_{1} }} h_{\text{c}}^{{\varnothing_{2} }} + \varepsilon_{i}$$
(4)

\(\hat{v}_{\text{c}}\) is the estimated commercial volume (m3); \(\varnothing_{0}\), \(\varnothing_{1}\) and \(\varnothing_{2}\) are the parameters to be estimated; dbhwb is the diameter with bark at 1.3 m height (cm); hc is the commercial height (m), and ɛi the random error (m3).

$$\varnothing = \left[ {\begin{array}{*{20}c} {\varnothing_{0} } \\ {\varnothing_{1} } \\ {\varnothing_{2} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\beta_{0} } \\ {\beta_{1} } \\ {\beta_{2} } \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} {b_{0} } \\ {b_{1} } \\ {b_{2} } \\ \end{array} } \right] = \beta + b.$$
(5)

Given that b ~ N (0, σ2); e ε ~ N (0, σ2I).

The “nlme” function of the “nlme” package (Pinheiro et al. 2018) implemented in software R 3.4.4 was used for mixed-effects modeling, where it was estimated the parameters of fixed-effects and covariance parameters associated with parameters of random effects, more detailed explanation on mixed-effects modeling could be found in the work of Yang et al. (2009).

Artificial neural networks (ANN) and support vector machines (SVM) also were fitted to estimate commercial volume, following the assumptions presented in Haykin (2009) for ANN and Steinwart and Christmann (2008) for SVM. The ANN used in this study was multilayer perceptron type with three layers and feedforward architecture. The input layer consisted of two neurons, dbhwb and hc, in the hidden layer the number of neurons was varied from one to five, and at the output layer, only one neuron was used, referring to vc.

Justified by the simplicity of implementation and interpretation, the hyperbolic tangent function was used as the activation function at the hidden layer and the linear function at output layer. Supervised training was used with the error backpropagation algorithm associated with a descending gradient method. In this algorithm, the error associated with each pair of neurons of the input and output layers is calculated and retro-propagated to fit synaptically and bias weights, aiming to reduce the estimation error (Collazo et al. 2016).

SVM is a system based on mathematical optimization derived from statistical learning (Binoti et al. 2016). It was used to map the training data in high-dimensional spaces, using a kernel function, and then, a linear regression was used to find an optimal hyperplane with a maximum margin of data separation (Sheta et al. 2015; Vapnik 1999). Therefore, were used different values for the cost (C), gamma (γ), and epsilon (ϵ), totaling 2541 trained SVM configurations, resulting from the combination of 21 different values for C (range 90–110, jumps: 1), 11 for γ (range 0.02–0.03, jumps: 0.001), and 11 for ϵ (range 0–0.1, jumps: 0.01), besides radial base function (Eq. 6), only used in the kernel. To optimize the SVM issue (Drucker et al. 1996), the objective error function of type I was used (Eq. 7)

$$K(X_{i} ,X_{j} ) = {\text{e}}^{{\left( { - \,\gamma * \left| {X_{i} - X_{j} } \right|^{2} } \right)}}$$
(6)
$${\text{Min}} \left( {\frac{1}{2} * \left| {\left| w \right|} \right| + C *\mathop \sum \limits_{i = 1}^{N} \xi_{i}^{ - } + C *\mathop \sum \limits_{i = 1}^{N} \xi_{i}^{ + } } \right)$$
(7)

Subject the following restrictions:

$$\begin{aligned} & w_{*} \varPhi \left( {x_{i} } \right) + b{-}y_{i} \le \epsilon + \xi_{i}^{ + } \\ & y_{i} {-}w_{*} \varPhi \left( {x_{i} } \right){-}b \le \epsilon + \xi_{i}^{ - } \\ & \xi_{i}^{ - } ,\quad \xi_{i}^{ + } \ge 0,\quad i = 1, \ldots ,N \\ \end{aligned}$$

where γ is the gamma value; Min is the minimization; w is the coefficient vector; C is the cost value; ξ i , ξ +i are the gap variables for errors below and above the ε; i is the training case; N is the total number of cases trained; Φ (xi) is the kernel function used; b is the error; yi is the output values; and ϵ is the epsilon value.

Configurations of the SVM were fitted by using dbhwb and hc as input variables and vc as the output. During the training of ANN and SVM, the k-fold cross-validation method was used to avoid overfitting (Jung 2018), with four and ten folds, respectively. Both trainings were performed with functions “h2o.deeplearning” and “svm” functions of the “h2o” (The H2O.ai Team 2017) and “e1071” (Meyer et al. 2017) packages present in the software R 3.4.4.

Statistical analysis

The approaches were compared without bias using data proportionally partitioned by diametric classes (Fig. 2), arranged by the formula proposed by Sturges (1926). For that, were they used approximately 67% and 33% of the data for the fit and test sets, respectively. The goodness-of-fit made by the different approaches in both sets were evaluated by Pearson’s linear correlation coefficient \(\left( {r_{{y\hat{y}}} } \right)\), by the root-mean-square error (RMSE), in percentage, and by standardized residual scatterplot, beyond of Chi-square test (χ2) at the 5% level of significance, used only for test set. Additionally, univariate analysis of variance with a randomized block experimental design was performed to compare the averages of the estimates made by the different approaches tested. In this analysis, we used the approaches that presented adherence by χ2 as treatments repeated ten times in 36 blocks (inventory plots).

Fig. 2
figure 2

Number of trees used for fit and test of the approaches according to the different diametric classes

The use of each plot as a block was based on the explanation of Winer (1962), which explained about single-factor experiments with repeated measures in the same elements. This author cites that one of the primary purposes of experiments in which the same individual is observed under each of the treatments is to provide control over differences between them, in this case, on the plots. As a result, each plot served as its control and the variability attributable to differences between them was eliminated from the experimental error.

Variances of treatments were evaluated for their homogeneity by Bartlett’s test (Bartlett 1937). Once they showed heterogeneous variances, the original values were transformed by Box and Cox’s method (Box and Cox 1964), so that were tested the effects of treatments through Fisher’s test; when it revealed significant differences in at least one of averages, Scott–Knott’s test was used at the 5% level of significance to compare the treatments. In software R 3.4.4, were conducted analyses with the functions “bartlett.test”, “boxcox”, “aov”, and “SK” of packages “stats”, “MASS” (Venables and Riplay 2002), and “ScottKnott” (Jelihovschi et al. 2014).

Results

All parameters of the nonlinear equations and the mixed nonlinear equation displayed a significance of less than 5%, and all the approaches had a Pearson’s coefficient of correlation greater than 0.99, and RMSE values were less than 7.0%, when evaluating the estimations of the fit set (Table 2).

Table 2 Estimated parameters, hyperparameters, and precision measures of approaches studied for vc estimate

Nonlinear equation of Schumacher and Hall with mixed effects stood out due to better fitting the variability of separate trees for fit since it showed higher \(r_{{y\hat{y}}}\) and lower RMSE; however, Kozak’s nonlinear taper model exhibited the worst fit statistics. In the test sets, measurements indicated less precision with lower \(r_{{y\hat{y}}}\) and higher RMSE, since were used different data sets, on the other hand, the Chi-square test showed adherence between the estimated and observed commercial volumes by all evaluated approaches. Like what happened in the fit, Kozak’s nonlinear taper model presented lower \(r_{{y\hat{y}}}\) and higher RMSE; however, the ANN showed higher \(r_{{y\hat{y}}}\) and lower RMSE.

All approaches showed most of the standardized residue well distributed and ranged between -3 and 3 (Fig. 3). However, residuals from the fit set had a smaller amplitude in the distribution compared to the residuals in the test estimates.

Fig. 3
figure 3

Residual distribution of vc estimated by the evaluated approaches

Residual dispersions of approaches were similarly distributed over most of the amplitude of estimates, except for the Kozak’s nonlinear taper model, corroborating with the fit statistics. However, estimations made on data observed in fit set by Schumacher and Hall equation fitted with mixed-effects modeling tended to underestimate the vc of trees with lowest observed values, and, on the other hand, ANN overestimated them.

It is also observed that the residues of these two approaches showed a more homogeneous distribution in all amplitudes of data estimated for the trees of the test. Because of that and considering test set the most important modeling process for increasing credibility and confidence, the equation generated by ANN with two neurons at the input layer, four neurons at the hidden layer and one neuron at the output layer was considered the most accurate, nevertheless it should be noted that estimates of other approaches provided good precisions. With the weights and bias obtained in the training of the best ANN, Eqs. 811 provide the outputs of four neurons in the hidden layer, and vc can be estimated using Eq. 12

$$\varnothing_{1} = \frac{2}{{1 + {\text{e}}^{{ - \,2[ - \,0.015099874\left( {\frac{{{\text{dbh}}_{\text{wb}} - 21.6893}}{5.9571}} \right) - 0.431192756\left( {\frac{{h_{\text{c}} - 3.7793}}{0.8291}} \right) - 0.560754387]}} }} - 1$$
(8)
$$\varnothing_{2} = \frac{2}{{1 + {\text{e}}^{{ - \,2[ - \,0.778621435\left( {\frac{{{\text{dbh}}_{\text{wb}} - 21.6893}}{5.9571}} \right) - 0.308296472\left( {\frac{{h_{\text{c}} - 3.7793}}{0.8291}} \right) + 1.350068118]}} }} - 1$$
(9)
$$\varnothing_{3} = \frac{2}{{1 + {\text{e}}^{{ - \,2[( - \,0.413494974\left( {\frac{{{\text{dbh}}_{\text{wb}} - 21.6893}}{5.9571}} \right) + 0.303500712\left( {\frac{{h_{c} - 3.7793}}{0.8291}} \right) - 0.269561916]}} }} - 1$$
(10)
$$\varnothing_{4} = \frac{2}{{1 + {\text{e}}^{{ - \,2[1.068626761\left( {\frac{{{\text{dbh}}_{\text{wb}} - 21.6893}}{5.9571}} \right) + 0.496493667\left( {\frac{{h_{\text{c}} - 3.7793}}{0.8291}} \right) + 0.366771834]}} }} - 1$$
(11)
$$\hat{v}_{\text{c}} = 0.071627\,(\varnothing_{1} * -0.78371859 + \varnothing_{2} * 1.35006812 + \varnothing_{3} * -0.26956192 + \varnothing_{4} * 0.36677183 + 0.72287511) + 0.146012$$
(12)

where \(\varnothing_{1,2,3,4}\) are the outputs of hidden neurons; dbhwb is the diameter with bark measured at 1.3 m height (cm); hc is the commercial height (m); and \(\hat{v}_{\text{c}}\) is the estimated commercial volume (m3).

Volumes estimated by different approaches showed heterogeneous variances (K = 14.01; pvalue = 0.007), requiring data transformation to homogenize variances and test the effects of treatments by Fisher’s test. The homogenization of variances was affected when using the 0.5050 value in the Box and Cox formula (K = 7.40; pvalue = 0.116). The univariate analysis with the estimated values of commercial volume indicated that there was no significant difference among averages of the estimated commercial volumes by the different approaches (Fcal. = 0.99, pvalue = 0.409), but there was a difference between the plots (Fcal. = 14.75, pvalue < 0.01). The estimated minimum and maximum values were 0.0001 m3 and 0.6383 m3, respectively, with a mean of 0.1748 m3 per tree (Fig. 4), divided into interquartile ranges of 0.1029; 0.1587 and 0.2385 m3 to 25%, 50%, and 75%, respectively.

Fig. 4
figure 4

Variation of vc estimated by different models. ANN is the artificial neural network; KNL is nonlinear Kozak; SHNL is the nonlinear Schumacher and Hall; SHNLM is the mixed nonlinear Schumacher and Hall; and SVM is the support vector machine

Discussion

Equations developed using traditional approaches were viable for estimates of the commercial volume of Brazilian mahogany trees, as well as equations developed with nonlinear mixed-effects approaches, ANN, and SVM, because they all presented accurate estimates of commercial volume. All parameters estimated by traditional methods and mixed effects were significant, which ensures higher accuracy, the better possibility of use in other works with agroforestry system that presents structural characteristics similar to that used in this research, and no multicollinearity among the independent variables used in the fit (Sileshi 2014; Silva and Santana 2014; Siqueira et al. 2017).

The Schumacher and Hall model is used worldwide to model the individual volumetric production of forest species in natural environments (Akindele and LeMay 2006; Gimenez et al. 2015, 2017) or in a monoculture system (Shiver and Brister 1992; Silva et al. 2009; Schikowski et al. 2018) because it was developed to explain the functional relationship of the volume of trees with the variables that are easily measured in the field as diameter at 1.3 m above ground and height. The fit of this model in the original or logarithmic form usually presents significant parameters and good accuracy, as also observed in the present study and Ribeiro et al. (2014) with three species in the National Forest of Tapajós and Cysneiros et al. (2017) with 32 commercial species from the Amazon in Concession forest.

Mixed-effects modeling can generate equations that accurately estimate significantly higher than the precision of equations generated by the least-square method (Crecente-Campo et al. 2010; Gouveia et al. 2015; Sharma et al. 2017) because mixed modeling adds the estimated random parameters to the fixed value of the parameters before estimating. Contrary to this statement, Miguel et al. (2013) suggested that fixed-effects models should be used when calibration data are not available, even comparing with the use of mixed effects.

Randomness generates different parameters for each subsample (Ercanli et al. 2015; Fu et al. 2015; Sharma et al. 2018), that in this research are individuals, even if implanted in a single site and the accuracy of these parameters is influenced by the number of subsamples, with a direct relationship with the quantity used (Fu et al. 2013, 2017). Based on the above, we believe that the parameters estimated in this research are consistent since were used 108 individuals for their estimates. The use of random effects variance components in the fit of the Schumacher and Hall model also increased the accuracy of estimates in the fit set of this study, but there was no gain in accuracy when test set estimates were analyzed.

The use of ANN to estimate the volume of trees has been already reported; however, all studies are not comparable to ours, because they have not worked specifically with Brazilian mahogany in agroforestry systems or natural environmental conditions. Many studies used ANN to model the volume of species of genus Eucalyptus (Soares et al. 2011; Silva et al. 2018b; Tavares Júnior et al. 2019), of species of genus Pinus (Diamantopoulou 2005; Diamantopoulou and Milios 2010; Çatal and Saplioğlu 2018), of other species in natural environments or implanted in a monoculture system (Özçelik et al. 2010; Sanquetta et al. 2015, 2017), and this study is the first to train and test an ANN for estimation of the commercial volume of species in an agroforestry system.

In comparative studies with other approaches, the ANN was also more efficient in estimating the commercial volume of trees in the National Forest of Tapajós (Ribeiro et al. 2016) and the estimation of stem form of Araucaria angustifólia (Martins et al. 2017), for example. The ANN superiority about the other approaches is mainly due to the parallel processing of neurons, a noise tolerance, and the high capacity of learning by modeling the complex nonlinear interactions that exist between variables (Zou et al. 2008; Egrioglu et al. 2014; Reis et al. 2018).

The experimental design used was efficient to reduce the experimental error, but it was not enough to identify differences between the averages of estimates made by the approaches. Only results similar to this were found in studies comparing average values of volumes estimated by different approaches via analysis of variance, for example, Correia et al. (2017) compared the volume of wood estimated in ombrophilous dense forest on the coast of Santa Catarina by form factor with the volume estimated by volumetric equation and Lanssanova et al. (2018) compared volumetric estimates obtained by form factor, volumetric models, and taper model for commercial species of the Amazon Forest and found no significant differences.

Nevertheless, it was found that the amplitude of the estimates made by ANN was the smallest when comparing the amplitudes of other approaches and a low value signifies that equation should be sound and effective for estimate the response variable and that the estimates presented lower variance, justifying why the choice of this method as the most accurate. The use of more accurately approach, even if they show a small gain in accuracy, would impact considerable differences in the production of agroforestry system with a large area, for example.

Conclusion

The equation fitted with mixed effects in the Schumacher and Hall model for the estimation of the commercial volume is more accurate than the fitted of traditional form, considering only the fixed effects; however, the artificial neural network presented greater precision, especially in the test data.

No significant differences were found between the averages of the commercial volume estimated by the different methods in the analysis of variance indicating that using the simplest equation, such as the Schumacher and Hall nonlinear equation, can be used to estimate the commercial volume of Brazilian mahogany in agroforestry systems. However, we indicate the use of ANN with two input neurons, four at the hidden layer, and one at the output layer because presented the highest precision.

This is the first study to develop equations for estimating the commercial volume of Brazilian mahogany in an agroforestry system in the Amazon, and the equations can be used and tested in other agroforestry systems and serve as a basis for the management of the species in the Brazilian Amazon.