Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In Portugal, wildfires (related to natural forests and other plant areas) have been increasing in the last years. Fire is indeed an important issue in Mediterranean region affecting namely the ecological and economic aspects of forest areas and causing loss of human life. Many factors have contributed to the increasing number of wildfires, e.g., climate change [7]. Some studies have identified changes in the number of fires, burned area and fire size distribution depending on topographical variables and vegetation type, e.g., in the Spanish region Catalonia [10] and Portugal [13].

Gomes [9] pointed out many causes and consequences of forest fires in Portugal, e.g., currently, rural and forest areas in Portugal are considerably deserted due to population migrations from these areas to the main cities, which began in the 1950s. Fernandes et al. [5] proposed a fuel modelling and fire hazard assessment, used to evaluate and compare the fire hazard potential between forest types defined by their composition and structure. They found that potential fire behaviour is primarily driven by stand structure, rather than by cover type.

Marques et al. [13] presented an approach of the characterisation fire occurrence in Portugal, combining the use of geographic information systems (GIS) and generalised linear models (GLM). They emphasised the relationship between ecological and socioeconomic features on the proportion of area burned, recording also the number of fires and fire size for three 5-year periods, including the period 1990–1994. Descriptive statistics indicated variations in the distribution of fires over recent decades, with a significant increase in number of very large fires. Regression models underlined the impact of the forest cover type and the proximity to roads on the proportion of area burned.

For modelling wildfires, GLM [1, 14] have been often adopted, even as that is based on the Gaussian distribution by transforming the response [13]. Ferrari and Cribari-Neto [6] proposed a regression model where the response is beta distributed using a parameterisation of the beta law that is indexed by mean and dispersion parameters. Beta regression can be used for modelling the proportion of area burned that is restricted to the interval (0, 1). The regression parameters of the beta regression model are interpretable in terms of the mean of the response and, when the logit link is used, of an odds ratio, unlike the parameters of a linear regression that employs a transformed response [6].

This work proposes to model the proportion of burned area due to wildfires in Portugal, based on beta regression and under a Bayesian perspective (see e.g. [8, 17] for some Bayesian GLMs). The rest of the article is organised as follows. Section 2 succinctly describes the motivation of this work and the different modelling of wildfires. In Sect. 3 we present Bayesian beta regression for modelling the proportion of area burned, taking the use of Markov chain Monte Carlo (MCMC) methods for estimating quantities of interest. Some results of Bayesian beta regression related to the wildfire data analysis in the entire Portuguese mainland between 1990 and 1994, and concluding remarks are done respectively in Sects. 4 and 5, including the identification of the fire risk factors.

2 Motivation and Methods

In Portugal, burned area mapping, obtained by semi-automated classification of high-resolution remote sensing data from Instituto Superior de Agronomia (ISA)—Universidade de Lisboa, identified 35,198 fire perimeters with burned areas equal to or greater than 5 ha in the period 1975–2007 and the corresponding area burned is about 3. 8 × 106 ha that is equivalent to nearly 40 % of the country area [13].

In the period 1990–1994, Marques et al. [13] pointed out that: (1) 5706 Portuguese wildfires were recorded and the total burned area extended over 442,745 ha, burning about 4.97 % of the country area, (2) the average area burned per wildfire was 77 ha, (3) 149 wildfires extended over 500 ha, accounting for 44 % of the burned area, (4) none extended over 10,000 ha. Figure 1 (left side) exemplifies the distribution (frequency) of these fires identifying high and critical fire zones that are specially located in the northern and central interior of Portugal.

Fig. 1
figure 1

Fire perimeters between 1990 and 1994 in Portugal (left), a zoom over a burned area is shown in the right, based on the classes of the covariates: (a) land use, (b) altitude, (c) slope, (d) slope orientation (aspect), (e) road proximity, (f) population density, (g) temperature, (h) precipitation, (i) layer indicating the fire perimeters

In order to analyse variations in Portuguese wildfires in 1990–1994, the areas burned were included as map layers in the GIS database according to eight fire features (covariates), which were initially categorised, based on extensive preliminary data analysis and referred in the paper [13], into several classes: altitude (m), slope (%), slope orientation, population (hab/km2), roads proximity (m), number of days with precipitation greater than 1 mm in the fire season (from May to October), number of days with maximum temperature higher than 25 C in the fire season, and land cover (Table 1), including the observed proportion of the each land use classes in parentheses. Theses classes were also chosen based on some studies using the same data such as Moreira et al. [15] and Pereira et al. [19].

Table 1 Description of the classes of the eight fire features used in the wildfire data

Figure 1 illustrates the fire perimeters used for constructing burned area data from related map layers. Notice that land cover map used in this study further included a map at the scale 1/25,000 (Carta de Ocupação do Solo—COS’90) produced by Instituto Geográfico Português using cartographic information from aerial photography mostly dated from 1990 [13], as well as that road proximity included trails and was defined (1000 m distance) based on previous work e.g. Catry et al. [3]. Although continuous covariates as temperature and precipitation could be better explored in their natural form, we chose to categorize them because of a matter of simplicity and interpretation for the data collection and the model parameters, respectively.

For the modelling of wildfires, we record the observed proportion of burned area, denoted by r i that is the burned area out of total area for the ith combination of levels for the covariates in study, i​ = ​ 1, , k. We propose to model the proportion of burned area from these eight underlying covariates by assuming beta distribution for r i , i.e.,

  • Beta model: \(\ r_{i} \sim Beta(\mu _{i}\phi,(1\! -\!\mu _{i})\phi )\), with mean E(r i )​ = ​μ i and variance \(V ar(r_{i})\! =\! \frac{\mu _{i}(1-\mu _{i})} {\phi +1}\).

Alternative GLM can model the proportion r i , for instance:

  • Gaussian model: \(\ logit(r_{i}) \equiv \log (r_{i}/(1\! -\! r_{i})) \sim Normal(\mu _{i},\sigma ^{2})\);

  • Gamma model: \(\ -\log (r_{i}) \sim Gamma(\nu,\nu /\mu _{i})\), with \(E(r_{i})\! =\!\mu _{i}\) and \(V ar(r_{i})\! =\! \frac{\mu _{i}^{2}} {\nu }\).

These two models and other GLM based on transformations of r i , such as \(\arcsin (\sqrt{r_{i}})\) and Box-Cox transformation, are discussed and developed in [1, 14]. Figure 2 displays histograms of the observed proportion of area burned without and with logit transformation in Portugal during the period 1990–1994, indicating that transforming response may not be the best way of wildfire modelling, what happens in the proportions close to one in Fig. 2 and notice that the beta model does not transform the response.

Fig. 2
figure 2

Histograms of the observed proportion of area burned without (right) and with (left) logit transformation in Portugal during the period 1990–1994

3 Bayesian Beta Regression

Let \(r_{1},\ldots,r_{k}\) be random variables, where r i follows a beta distribution with mean μ i and unknown precision ϕ, whose probability density function is

$$\displaystyle{ f(r_{i}\vert \mu _{i},\phi ) = \frac{\varGamma (\phi )} {\varGamma (\mu _{i}\phi )\varGamma ((1\! -\!\mu _{i})\phi )}r_{i}^{\mu _{i}\phi -1}(1 - r_{ i})^{(1\!-\!\mu _{i})\phi -1},\quad 0 <r_{ i} <1, }$$
(1)

where Γ(⋅ ) is the gamma function, 0​ < ​μ i ​ < ​ 1 and ϕ​ > ​ 0, \(i\! =\! 1,\ldots,k\). Notice that the parameterisation of the beta distribution (1) was suggested by Ferrari and Cribari-Neto [6] in order to model response variable that is continuous and restricted to the interval (0, 1) and is related to other variables through a regression structure.

The beta regression model is obtained from Eq. (1) by assuming that the mean μ i can be written as

$$\displaystyle{ g(\mu _{i}) =\mathbf{ z}_{i}^{T}\boldsymbol{\beta } \equiv \eta _{ i}, }$$
(2)

where \(\boldsymbol{\beta }= (\beta _{1},\ldots,\beta _{p})^{T}\) is the regression parameter vector associated with the covariate vector \(\mathbf{z}_{i} = (z_{i1},\ldots,z_{ip})^{T}\) for the ith observation, \(i\! =\! 1,\ldots,k\), and g(⋅ ) is a logit link function \(g(\mu ) =\log [\mu /(1-\mu )]\) (for other link functions, see [1, 6, 14]).

3.1 Posterior Distribution

For the likelihood, we can assume different sampling distributions for the proportion r i , e.g., beta distribution defined in Eq. (1) or normal distribution for the transformed proportion, as referred in Sect. 2. Based on the former distribution with logit link function in Eq. (2), the likelihood function is given by

$$\displaystyle{ L(\boldsymbol{\beta },\phi \vert \mathbf{x}) =\prod _{ i=1}^{k} \frac{\varGamma (\phi )} {\varGamma (\mu _{i}\phi )\,\varGamma ((1\! -\!\mu _{i})\phi )}r_{i}^{\mu _{i}\phi -1}(1\! -\! r_{ i})^{(1-\mu _{i})\phi -1}, }$$
(3)

where \(\mathbf{x} =\{ r_{i};\mathbf{z}_{i},i\! =\! 1,\ldots,k\}\) is the data, and \(\mu _{i} = e^{\mathbf{z}_{i}^{T}\boldsymbol{\beta } }/(1\! +\! e^{\mathbf{z}_{i}^{T}\boldsymbol{\beta } })\), \(i\! =\! 1,\ldots,k\).

In Bayesian analysis, we also consider information a priori that here consists of assuming independent normal distributions with zero mean and variances \(v_{j}^{2}\) for the regression coefficients, j​ = ​ 1, , p, and inverse gamma distribution with shape a and scale b parameters for the precision parameter ϕ (or gamma distribution with shape a and scale b parameters for the variance σ 2 related to normal regression). In fact, we assigned non-informative prior distribution, i.e., highly dispersed, but proper normal and inverse gamma prior distributions for the model parameters \(\boldsymbol{\beta }\) and ϕ (or 1∕σ 2), respectively. In that case, one expects that inferential results on the model parameters are not too different from those ones under a frequentist approach.

Assuming a priori independence amongst the model parameters, we can construct the joint posterior density related to the beta regression model (2), which is denoted by

$$\displaystyle{ \pi (\boldsymbol{\beta },\phi \vert \mathbf{x}) \equiv \frac{L(\boldsymbol{\beta },\phi \vert \mathbf{x})\pi _{1}(\boldsymbol{\beta })\pi _{2}(\phi )} {\int \int L(\boldsymbol{\beta },\phi \vert \mathbf{x})\pi _{1}(\boldsymbol{\beta })\pi _{2}(\phi )\ d\boldsymbol{\beta }d\phi }, }$$
(4)

where \(\pi _{1}(\boldsymbol{\beta })\) and π 2(ϕ) are the normal and inverse gamma prior distributions of \(\boldsymbol{\beta }\) and ϕ, respectively, being the distribution (4) proportional to

$$\displaystyle{ \prod _{i=1}^{k}\frac{\varGamma (\phi )\ r_{i}^{\mu _{i}\phi -1}(1\! -\! r_{ i})^{(1-\mu _{i})\phi -1}} {\varGamma (\mu _{i}\phi )\,\varGamma ((1\! -\!\mu _{i})\phi )} \ e^{-\frac{1} {2} \sum _{j=1}^{p}(\boldsymbol{\beta }_{ j}^{2}/v_{ j}^{2}) }\ \phi ^{-(a+1)}e^{-b/\phi }. }$$
(5)

Notice that the mean μ i is a function of the linear predictor \(\eta _{i} =\mathbf{ z}^{T}\!\boldsymbol{\beta }\), \(i\! =\! 1,\ldots,k\).

The joint posterior distribution (5) is awkward to work with, since the marginal posterior distributions of some parameters are not easy to obtain explicitly. These posteriors can be evaluated using MCMC methods (see e.g. [8, 11, 17]). In particular Gibbs sampling that works by iteratively drawing samples for each parameter from the corresponding full conditional distribution, which is friendly implemented in software WinBUGS [12]. Other MCMC method, proposed by Hoffman and Gelman [11], is the No-U-Turn Sampler (NUTS) that is a variant of the Hamiltonian Monte Carlo (HMC), also known as hybrid Monte Carlo. Neal [16] presented HMC method in order to avoid a long time to converge to the posterior distribution as e.g. in Gibbs sampling by using a clever auxiliary variable scheme that transforms the problem of sampling from a posterior distribution into the problem of simulating Hamiltonian dynamics.

3.2 Evaluating and Comparing Models

An important issue in Bayesian data analysis is to choose among postulated sub-models of a statistical model, e.g. the beta regression model (2). Some summary measures of model comparison, such as the posterior mean of Deviance \(D(\boldsymbol{\theta })\), where \(\boldsymbol{\theta }\) is the model parameter vector, are easily evaluated with MCMC methods. Other two measures of predictive accuracy are Deviance Information Criterion (DIC) and Watanabe-Akaike Information Criterion (WAIC) (see [8, 21]). DIC is here defined as

$$\displaystyle{ DIC = D(\overline{\boldsymbol{\theta }}) + V ar(D(\boldsymbol{\theta })), }$$
(6)

where \(\overline{\boldsymbol{\theta }}\) and \(V ar(D(\boldsymbol{\theta }))\) denote the posterior mean of model parameter \(\boldsymbol{\theta }\) and the posterior variance of the deviance, respectively, whereas WAIC is defined by

$$\displaystyle{ WAIC = D(\overline{\boldsymbol{\theta }}) + 2\,\sum _{i=1}^{k}V ar(D_{ i}(\boldsymbol{\theta })), }$$
(7)

where \(V ar(D_{i}(\boldsymbol{\theta }))\) denotes the posterior variance of the ith term of the deviance. DIC and WAIC handle Bayesian models of any degree of complexity, and models with smaller (6) and (7) should be preferred to models with larger ones.

4 Wildfire Data Analysis

For the wildfire data described in Sect. 2, we fitted several regression models based on the response, proportion of the burned area in Portugal during the period 1990–1994, as in Marques et al. [13], but now focusing on the beta regression instead of normal regression, and under a Bayesian perspective. One of the eight covariates presented in Table 1, i.e. slope orientation, was removed from the analysis by not showing any difference among its categories.

Let M 1 and M 3 denote regression beta model (1) with eight covariates showed in Table 1, apart from the covariate slope orientation, whereas M 2 and M 4 represent the corresponding normal models. Table 2 lists these sub-models of the beta and normal model with only main covariate effects, M 1 and M 2, and also with interactions between two covariates. Based on the comparison model measures DIC (6) and WAIC (6), fitted models with interactions had better evaluation than models with only main effects for both beta and normal models. These evaluating values were calculated taking into account the same response \(r_{i}/(1\! -\! r_{i})\) (so-called odd), which generates a sampling log-normal and second-kind beta distributions for normal and beta distributions, respectively. So, normal regression had better fitting than beta regression, and that can namely be associated with the large number of observations (k​ = ​ 25, 388). However, we chose to select model M 3 in order to illustrate the beta regression model that has not been employed in the analysis of wildfire burned areas, even as it can be considered the natural choice.

Table 2 Model comparison measures of four fitted regression models for wildfire data

For all models showed in Table 2, we assumed prior normal distribution with mean zero and variance 104 for the regression parameters and prior inverse gamma and gamma distributions with shape parameter 1 and scale parameter 0.01 for the precision parameter ϕ (beta regression) and the variance σ 2 (normal regression), respectively. That is, highly dispersed, but proper prior distributions. MCMC samples of size 5000 were obtained for all models, after 2500 iterations of burn-in, implemented in software Stan [20]. A study of convergence of the samples was carried out with no worrying features.

For selected beta model M 3, Table 3 displays the model parameter estimates: posterior mean, standard deviation (SD) and 95 % highest posterior density (HPD) credible intervals (CI) for the model parameters. Note that related to the proportion of area burned in Portugal during the period 1990–1994:

  1. 1.

    There is no significant effect of annual and permanent crops in contrast to the other categories of land cover;

  2. 2.

    The land covers with larger likelihood to have wildfires are (in increasing order) agro-forestry, hardwoods, hardwoods and softwoods mixed with eucalyptus (HSME), resinous or softwoods (RS), eucalyptus, softwoods mixed with eucalyptus (SME), and shrubs (the most likelihood).

  3. 3.

    The proportion increases for larger categories of slope and altitude, whereas population and roads proximity display a decreasing effect in the proportion.

  4. 4.

    Because temperature and precipitation had an unexpected negative effect in the proportion, we decided to look for a potential interaction effect between the two covariates. We found significant interaction between temperature and precipitation in model M 3, even as that is not clear for the smaller categories of both covariates.

  5. 5.

    As large the categories of temperature and precipitation as large is the odd of burned area (interaction effect). Notice that the largest category did not have observation enough for confirming that.

  6. 6.

    The estimates in Table 3 also indicates that there is some dispersion in the proportion of area burned (see 95 % HPD credible interval of ϕ).

Table 3 Estimates of the regression parameters and dispersion parameter (ϕ) for model M 3

5 Concluding Remarks

This analysis of wildfire data in Portugal allow us to figure out the influence of the observed combinations of eight fire risk features on the proportion of burned area. Our results of beta regression are essentially consistent with those ones of normal regression, presented in Marques et al. [13], whose analysis did not include the explanatory variables: slope orientation, precipitation, temperature and the interaction between the last two ones. In fact, our model and conclusions bring improvements on the results reported by them based on a similar data set. So, we also identified changes in the proportion of burned area depending on topographical variables and vegetation type. Pereira et al. [18] pointed out that some variability of the burned area in Portugal is partly due both to the amount of precipitation in the fire season and in the preceding late spring season and to the occurrence of atmospheric circulation patterns that induce extremely hot and dry spells.

In addition, our intuition about interaction between precipitation and temperature was corrected, and we also believe that some latent variables can explain some unobserved heterogeneity in these wildfire data, e.g. spatial extra-variation across fire regions. For instance, Amaral-Turkman et al. [2] proposed a spatio-temporal model to analyse jointly the probability of ignition and fire sizes in Australia and New Zealand. Further research is being developed for capturing the spatio-temporal effects on the proportion of burned area, more proper sampling distributions and link functions. Notice that 4 % of observed burned areas were 0 or 1 being replaced by 10−10 and 1 × 10−10, respectively, for simplicity, We intend to include that issue in future work, as well as to do a full sensitivity analysis of our prior options (see e.g. [8]) and some simulation to clarify the impact of a big data as our wildfires in the results. For the our choice of beta regression instead of normal regression, we also believe that a comprehensive simulation study must be done in order to verify the second choice, as well as the residual analysis for understanding that unexplained situation of the observed proportions close to one (see e.g. Espinheira et al. [4]).