1 Introduction

The monitoring of the forest area of the Legal Amazon (encompassing the States of Acre, Amapá, Amazonas, Pará, Rondônia, Roraima, Tocantins, Mato Grosso and the western part of Maranhão) performed by the National Institute for Space Research (INPE) revealed deforestation rates that varied between 1 and 3×106 ha/year in the period 1991 to 1999 and the loss of about 6×107 ha (more than half a million km2) of forest until 2000. This deforestation results from an Amazon occupation process occurring from the second half of the 20th century and is associated with the expansion of the agriculture margin, the construction of highway transportation systems and development poles. Environmental damage caused by this deforestation has led to significant social problems, such as foundry concentration, the instability of farmers in the agricultural field, precarious urbanization, and social conflicts followed by varying degrees of violence. The agricultural margin expansion and the deforestation in the Legal Amazon are closely connected to the context of the reorganization of Brazilian agriculture, followed by the accelerated industrialization starting in the 1950s and, more recently, by attempts to adapt Brazil to the globalizing economy. In this context, many factors may have contributed to the high rates of deforestation, among which are the public and private funds available, the population dynamics, the organization of the production systems and various physical conditions.

Due to public politics, the deforestation analysis accomplished by the PRODES Project (analog or digital) is not sufficient to provide a basis for governmental action because the results are obtained yearly and, many times, are derived from informative action, that is, after actions have already taken place. Since the data are grouped by state and published a year after a deforestation event occurs, the federal and state governments cannot anticipate dynamic changes in the use of Amazonian soil. Likewise, it is necessary to supplement the PRODES data with other initiatives that allow the Brazilian State to develop preventive actions to combat illegal deforestation activities.

Among previous research, Arima et al. (2007) implemented an explicit space model to estimate the probability of burning in the Brazilian Amazon by using logistic regression. This model was improved regarding its use of predictive variables, such as the price of beef and soy bean oil, in a conceptual model addressing pasture care and the use of fire to clear it and, by doing so, simulating the impact of paved highways, live cattle exportation and fire prevention work. The analysis showed that fire is positively related to the price of beef and soy bean oil and that the creation of new preservation areas makes up for the negative impacts of prior devastating acts. Sousa and Duarte (2006) developed and compared two mathematical models, one based on logistic regression, and the other on fuzzy group theory, to define the indications for scintigraphy test performance as a laboratory exam. There were 194 patients identified who had their calcium and parathyroid hormone levels measured at baseline and underwent scintigraphy for parasitoids performed in diagnostic laboratories in São Paulo during the period from January 2000 to December 2004. The models’ performance was compared by the use of receiver operating characteristic (ROC) curves. The results showed that models had statistically significant differences in their development (p=0.026). The fuzzy model was particularly useful because, unlike the logistic model, it had the ability to use the parathormone information in an interval in which the calcium values showed little discrimination. Therefore, the mathematical model based on fuzzy group theory seemed to be more adequate than the one based on logistic regression for deciding whether to perform scintigraphy for parasitoids. Since it was the result of a methodological exercise, inferences about objective behavior may be inappropriate, as there are no representative population data.

These two models, logistic regression and fuzzy group theory, are used in the current study involving the mapping of burning risk in the state of Pará, Brazil, during the period from 1994 to 2004. The efficiency of each method is evaluated, and comparisons are made in relation to the adjustment quality of the data between the two models. The influence of variables such as vegetation, deforestation, weather and distance along the road over registered fire heat areas (response variables) in the considered period is evaluated.

2 Study Area

The Legal Amazon in Brazil is in the states of Acre, Amapá, Amazonas, Mato Grosso, Pará, Rondônia, Roraima, Tocantins and part of Maranhão, corresponding to an area of approximately 5 million km2. In this area, the part that has a forest physiognomy occupies about 4 million km2. The state of Pará (Fig. 1) occupies an area of 1,247,703 km2 with a population of 6.2 million inhabitants. It is divided into 6 mesoregions identified as Marajó, Metropolis of Belém, Low Amazon, South Western Pará, North Eastern Pará and South Eastern Pará. Due to the forest, the state of Pará has very humid weather, and it is hot due to its proximity to the equator. The main highways in the state of Pará are BR010, which extends from the capital Belém to Brasilia, PA150, which divides the southeast and northeast areas of Pará, PA256 and BR230, which cross the state from east to west, and BR163, which connects Santarem City in the west of Pará to Cuiabá, the capital of Mato Grosso State (Fig. 1).

Fig. 1
figure 1

Brazilian states which are part of the Legal Amazon (left). Images in yellow were taken by LANDSAT. Main highways (right) in the State of Pará

3 Methodology

3.1 Logistic Regression

Logistic regression is a regression technique available for models that describes the relation among different independent variables and a binary dependent variable (Kleinbaum 1994). McCullagh and Nelder (1989) presented four functions used in the modeling of data whose variables are binary: logistic functions, probit functions, log-log complementary functions, and log–log functions. In many areas, the logistic function has been used frequently not only because its theoretical properties are simpler but also because of its simple interpretation as the odds ratio logarithm.

In the logistic regression model, P(y i =1)=p i and P(y i =0)=1−p i are considered the success and failure probabilities, respectively. In reality, success represents the event’s occurrence and failure, its non-occurrence. When using a logistic regression model, the interest may be in the effect of a specific risk factor or in the identification of several factors associated with the response variable. In this research, the variables that are considered risk factors for the occurrence of burning (the response variables) are: deforestation, weather, vegetation and the distance to roads. Linear regression models are not appropriate for this type of analysis because in these models the variable value expected, i.e., the response, given a group of explanatory variables (E(Y/X)=Xb), assumes values of (−∞<E(Y/X)<+∞). Unless restrictions are imposed, this violates the probability laws once 0≤E(Y/X)=p≤1.

A simple solution to this problem is obtained by the use of a function g(p i ) that traces the values in the real number field over a [0,1] interval. The function g(π i ) may be defined as

(1)
(2)

where

$$ \pi_i = \frac{1}{1+ e^{(-\alpha - \beta \mathbf{X})}}.$$
(3)

Therefore,

$$ y_i = \pi(x_i) + e_i,$$
(4)

where the error term e i assumes only two values. If y i =1, e i =1−p(x i ) with a probability equal to p(x i ). On the other hand, if y i =0, e i =p(x i ) with a probability equal to 1−p(x i ). Thus, e i has an average of zero and variance equal to p(x i )[1−p(x i )], and the distribution of the variable response is binomial with a probability p(x i ). The estimated parameters are obtained from the probability function, which is expressed by

$$ L(\beta)= \prod _{i=n}^n \pi (x_i)^{y_i} \bigl[ 1- \pi (x_i)\bigr]^{1-y_i}.$$
(5)

Therefore, it is easier to maximize the L(β) logarithm defined as

$$ L(\beta) = \ln (\beta)= \sum^n _{i=1} y_i \ln \biggl[ \frac{\pi (x_i)}{1-\pi (x_i)}\biggr]+ \sum^n_{i=1} \ln \bigl[1- \pi (x_i)\bigr].$$
(6)

In contrast to the linear model, the equations obtained in the maximization process require iterative methods. According to Hosmer and Lemeshow (1989), the most frequently used method is the Newton–Raphson.

In general, the likelihood ratio statistic, like an F statistic in classical multiple linear regression, requires the identification of two models to be compared, one of which is a special case of the other. The larger model is sometimes called the full model, and the smaller model is sometimes called the reduced model; that is, the reduced model is obtained by setting certain parameters in the full model equal to zero. The likelihood ratio (LR) statistic is given by (Kleinbaum 1994)

$$ \mathit{LR} = -2 \ln \biggl( \frac{L_1}{L_2}\biggr)= -2 \ln \biggl( \frac{\mathrm{Reduced\ model}}{\mathrm{Full\ model}}\biggr).$$
(7)

3.2 Fuzzy Groups and Fuzzy Logic

The theory of fuzzy groups is, in great part, an extension of the traditional theory of groups. With the development of fuzzy logic, it became possible to apply the partial truth concept, making it possible to present any value between the extremes of completely true or completely false. Thus, a group may be formed by elements that belong there partially. In this way, for a universal group U (classic), the fuzzy subgroup F in U takes the form of a dominant pertinence function

$$ \mu_{F}:U\rightarrow [0,1],$$
(8)

where μF(x) represents the degree to which the element x from U belongs in the fuzzy subgroup F. This function maps the pertinence degree of element x in subgroup F, representing, for example, μF(x)=0 and μF(x)=1, the non-pertinence and the total pertinence of x in F, respectively. The fuzzy subgroup definition is formalized by enlarging the image of the characteristic function that was the group {0.1} to the interval [0.1], causing the classic group to become a particular case of the fuzzy group (Zadeh 1965; Barros and Bassanezi 2001).

The main idea of fuzzy groups is the pertinence degree (the value that indicates the degree to which an element belongs to a group). The groups are represented in a qualitative way, and elements of each group are characterized by varying the degree of attribution. For example, consider that A and B are fuzzy subgroups in the universal group U. Operations on the fuzzy groups such as union, interception and complementation also result in fuzzy groups. Following the standard operations for union, interception and complementation for fuzzy groups defined by Zadeh, the following pertinence functions are obtained

$$ \begin{array}{l}\mu_ {A\cup B}(x) = \mbox{m\'{a}ximo} \bigl[ \mu_A (x), \mu_B (x) \bigr],\quad x \in U;\\[2pt]\mu_ {A\cap B}(x) = \mbox{m\'{\i}nimo} \bigl[ \mu_A (x), \mu_B (x) \bigr],\quad x \in U;\\[2pt]\mu_ {A'}(x) = \bigl[ 1- \mu _A (x)\bigr], \quad x\in U .\end{array}$$
(9)

These are standard operations of a possible operator group known as t-norm for interception operators and t-conorm for union operators. If A and B are classical groups, the pertinence functions described above satisfy these equalities, showing the coherence of these definitions (Barros and Bassanezi 2001).

Using a linguistic approach, fuzzy logic becomes closer to a human’s reasoning, as the results are expressed using the pertinence degree concept, and it defers from classical logic as the fuzzy propositions reach values other than true and false (Reznik 1997). Fuzzy linguistic models represent a significant portion of the applications of fuzzy group theory. The linguistic models may be understood as specialist systems that linguistically describe a complex object to be analyzed. The basis of these models is a group of IF-THEN-type rules with vague predicates. The fuzzy groups are parameters of the model that associate with the rules from its structure (Ortega 2001).

The model’s basic structure includes four components: (i) an entrance fuzzy group, which is an initial stage where the entrances to the system are built by fuzzy groups and the pertinence functions are formulated; (ii) a base of fuzzy rules, considered the fuzzy linguistic model’s nucleon in which knowledge is expressed through a group of rules (IF-THEN) composed of fuzzy propositions described in a linguistic manner; (iii) fuzzy inference, in which each prior fuzzy component proposition is translated mathematically by fuzzy logic techniques (in this stage, the t-norms, t-conorms and inference rules are defined and used to obtain the fuzzy relations that the model is based on); and (iv) defuzzification, which is a final process that allows a fuzzy group to be represented by a whole number. Among inference methods, the most widely used are those of Mamdani and of Takagi–Sugeno–Kang (TSK). Overall, the latter method characterizes it as function of the entrance value of each rule. If the consequent is a fuzzy group and the interference is of the min-max type, then the model is of the Mamdani-type (Ortega 2001; Barros and Bassanezi 2001).

4 Analysis and Discussion of Results

The behavior of the environmental variables describing burning occurrences and deforestation and the effect of distance from the main roads on these variables are presented in Figs. 2 and 3. The southeastern region of Pará was the mesoregion that presented the highest burning rate, with 98,965 points of fires accounting for 69.3% of the state’s points, followed by the southwestern mesoregion with 23,603 points (or 16.5%). On the other end of the spectrum are the Marajo and Belém metropolitan mesoregions with burning rates equivalent to 1,249 and 84 points, respectively, both of which are less than 1% (Fig. 2, left). The highest deforestation rate, equivalent to an area of 118.62 km2 or 53.34% of the state’s total deforestation, also occurred in the southeast of Pará, followed by the northeast of Pará with 49.22 km2 (or 22.14%). The Marajó and Belém metropolitan mesoregions presented the lowest deforestation rates of approximately 3.55 km2 and 4.95 km2, respectively, both of which are less than 3% of the total deforestation area (Table 1 and Fig. 3 on the right).

Fig. 2
figure 2

Distribution of deforestation (left) and burning (right) in 2003 with respect to distance from roads in the state of Pará (Rodney 2005)

Fig. 3
figure 3

Distribution of deforestation (left) and fire (right) with respect to distance from main highways (BRs and PAs) in the state of Pará from 1996 to 2003 (Rodney 2005)

Table 1 Distribution of deforestation in 2003 by mesoregion in the state of Pará

Figure 3 shows the distribution of deforestation and burning in the state of Pará prior to 2003 along the main highways (BRs and PAs), and indicates in a nonlinear manner that the shorter the distance to a road, the higher the quantities of burning and deforestation. Regarding deforestation, the radius of influence from roads is a maximum of 110 km; however, this value is about 150 km for the burning parameter.

The relative distribution of deforestation in Pará may be evaluated by Lorenz’s curve, which shows the deforested area distribution in relation to the total state area (sampled area) distribution (Fig. 4). The deforestation intensity distribution may be evaluated by estimating the concentration area (Fig. 4), which is given by the spatial concentration rate, defined as I=(C−550)/450, where C represents the sum of percentages related to the deforested area in relation to the total area. The smaller this area is, the more uniform is the state’s deforestation distribution, and the larger this area is, the more concentrated is the deforestation in the studied area (Pará State).

Fig. 4
figure 4

Deforestation distribution using Lorenz’s curve (Rodney 2005)

In the current study, the spatial concentration rate presents a value of 0.572; therefore, the deforestation concentration is about 57.2% in the state of Pará. Figure 5 shows how the distribution of deforestation in Pará State is very irregular, with a higher concentration existing next to the BR-010 and PA-I50 highways located in the mesoregions (southeast, northeast and Belém Metropolitan) with the most infra-structure in the state.

Fig. 5
figure 5

PRODES Project images of deforestation in 2003

4.1 Mapping with the Use of Logistic Regression

The variables in the logistic model are burning occurrences (response variable) due to deforestation, distance from roads, weather and vegetation. The results presented are for fire sites that occurred in the period June 1, 1998, to December 31, 2003. Tables 2 and 3 show the codification used in the logistic model for the categorical variables.

Table 2 Variable codification
Table 3 Categorization of the variable Weather

The statistical program MINITAB is used to adjust the logistic regression model (Table 4). The significance of the variables in the model is evaluated at the descriptive level (p-values). All parameters are significant (p-value=0.00). The odds ratio allows us to conclude that

  1. (i)

    For the variable DEFOREST, the chance of an area burning is about 8 times higher if the area is deforested (in relation to a non-deforested area).

  2. (ii)

    For the variable VEGETATION, the chance of an area burning is about 1.2 times higher if the vegetation does not have lumber value (in relation to the vegetation with lumber value).

  3. (iii)

    For the variable WEATHER, the chance of an area burning is about 2.91 times higher if the weather is dry (in relation to not being dry) and about 1.38 times lower if the weather is humid (in relation to not being humid).

Table 4 Logistic regression results

As expected, the statistics LR=19284.413 and p-value=0.00 indicate that there is enough evidence that at least one of the coefficients is different from zero. Thus, considering what is revealed above, the logistic model adjusted to the data is

(10)

From the adjusted model, the probabilities of burning due to the presence of deforestation, vegetation and weather may be determined as

(11)

In the analysis of the probability of fire occurrence in the state of Pará, the combinations that most stand out may be observed in Table 5. A higher burning probability of 96.51% occurs when a region is characterized by deforestation, dry weather and vegetation with lumber value. The second highest probability of 92.92% represents a region with transitional weather, vegetation with lumber value and deforestation. For a deforested region with dry weather and vegetation without lumber value, the probability is 95.83%. A probability of 91.60% represents an area with no transitional weather, vegetation with lumber value and deforestation. Finally, a region with a chance of burning of 88.75% is characterized by deforestation, humid weather and vegetation with no lumber value.

Table 5 Rate of burning presence probability

The results of the logistic regression related to the probability of burning occurrence were mapped and imported to a Geographic Information System program (ARCVIEW) in which a map of burning risk in the state of Pará was assembled (Fig. 6).

Fig. 6
figure 6

Risk of burning in the state of Pará from 1998 to 2003 as determined by logistic regression (Rodney 2005)

An interval of 0.25 is considered to present the results. Thus, four levels of burning risk are defined, as shown in Table 6. It is revealed that about 20% (250,633 km2) of Pará is at a low risk of being burnt due to a lack of infrastructure (roads). The areas that have regular road infrastructure, BR-230 (Transamazon) and BR-163 (Cuiabá-Santarém) account for 26% (326,000 km2) of the state and are evaluated as having a medium risk of burning. The areas that have good road infrastructure (BR-010, BR-316 and PA-150) are at a high (29%) and very high (25%) risk of being burnt. These areas add up to 54%, which corresponds to 677,000 km2 of the state of Pará.

Table 6 Probability of burning in the state of Pará

4.2 Mapping with the Use of Fuzzy Logic

Considering that variables such as weather, vegetation and deforestation fail to form a well-defined geographical border, fuzzy logic was chosen for the numerical treatment of these variables based on the knowledge acquired with the logistic regression method. Subsequently, the response variable (burning) is mapped based on the identified fuzzy model. The model uses a fuzzy inference method (Takagi and Sugeno 1985) involving a group of IF-THEN-type rules, with the additional characteristic of allowing the use of previous knowledge of the rules. Constants (singletons) are used as consequences of each rule. The consequence (then) of each rule is obtained by the fuzzy intersection (t-norm, using the minimum) of the entrance variables, and its value represents the degree of the rule’s activation. The final exit to the model is a whole number, a weighted-average of the rules according to the activation’s degree (with values between 0 and 1), and the pertinence degree of the consequent situation derived from the established conditions.

The fuzzy groups for both the entrance variables (weather, vegetation, deforestation) and the exit variables (burning) are linguistic expressions. In this case, the linguistic categories established for the entrance variables were: dry, transition and humid for the weather variable; with wood value and without wood value for the vegetation variable; and deforested and not deforested for the deforestation variable. For the exit variable (burning), five linguistic burning risk categories were defined: very low, medium low, high and very high, to which were attributed singular pertinence degree values (singletons) of 0.0, 0.2, 0.4, 0.6, 0.8 and 1.0, respectively. These singular values are the translation of what is taken to be numerically interpreted by a linguistic expression. Figures 7, 8, 9, 10 show the categories and pertinent functions (triangular functions are used) for each variable.

Fig. 7
figure 7

Categories used for the entrance variable Weather

Fig. 8
figure 8

Categories used for the entrance variable Vegetation

Fig. 9
figure 9

Categories used for the entrance variable Deforestation

Fig. 10
figure 10

Categories used for the response variable Burning

A group of IF-THEN-type rules derived from entering established empirical knowledge from consulting projects about the burning situation into the entrance and exit variables are given below: (i) If (weather is dry) and (vegetation is valued) and (deforest is deforest) then (burning is very high); (ii) If (weather is dry) and (vegetation is valued) and (deforest is without deforestation) then (burning is high); (iii) If (weather is dry) and (vegetation is not valued) and (deforest is deforest) then (burning is very high); (iv) If (weather is dry) and (vegetation is not valued) and (deforest is without deforestation) then (burning is high); (v) If (weather is humid) and (vegetation is valued) and (deforest is deforest) then (burning is very high); (vi) If (weather is humid) and (vegetation is valued) and (deforest is without deforestation) then (burning is high); (vii) If (weather is humid) and (vegetation is not valued) and (deforest is deforest) then (burning is very high); (viii) If (weather is humid) and (vegetation is not valued) and (deforest is without deforestation) then (burning is medium); (ix) If (weather is transition) and (vegetation is valued) and (deforest is deforest) then (burning is very high); (x) If (weather is transition) and (vegetation is valued) and (deforest is without deforestation) then (burning is medium); (xi) If (weather is transition) and (vegetation is not valued) and (deforest is deforest) then (burning is very high); (xii) If (weather is transition) and (vegetation is not valued) and (deforest is without deforestation) then (burning is low).

Figure 11 shows schematically the functioning of the fuzzy model. In this case, it is shown by means of values suggested for the case of dry weather (with the value of 1.04), vegetation (with a high lumber value of 1.99) and a deforested area of about 50%. The final value obtained for burning risk is 0.752. This value, which is the average weighted by the activation degree of the exits in each rule, is indicative of the category HIGH, suggesting a high risk of burning in this situation.

Fig. 11
figure 11

Rule viewer that simulates the entire fuzzy inference process for specific values of the input variables

With the values acquired for burning risk from the many values of the entrance variables, risk maps can be created as long as the variables are geo-referred. The resulting map is presented in Fig. 12. There is a degree of similarity to the map obtained with the use of logistic regression. However, the map based on fuzzy logic seems to better define the areas of burning risk. On this map, the highest burning risk follows the main roads such as PA-150, BR-320 and BR-010. In Fig. 13, the main lumber areas in the state of Pará are shown. A comparison of the maps in Figs. 12 and 13 reveals that the areas with a greater burning risk are located very close to the main lumber zones in southern and northeast Pará. The regions with a lower burning risk are located, in general, in preserved areas, Indian areas and military areas shown in Fig. 14.

Fig. 12
figure 12

Map of burning risk in the state of Pará from 1998 to 2003 based on fuzzy logic

Fig. 13
figure 13

Lumber areas in the state of Pará in 2007 (INPE)

Fig. 14
figure 14

Indian territory in the state of Pará in 2003 (IBGE)

5 Conclusion

The methodology used in this assignment shows that burning occurrences are influenced by factors such as weather, vegetation and deforestation. The two methods used to map burning risk show similar results. Specifically, the map built with the use of fuzzy logic yields somewhat better results than the map created by logistic regression. These results are important for initiating public actions. First, the areas with high burning risk should be inspected to reduce the number of potential fires. The lowest rates of burning are associated with a low density of roads. In other cases, they are associated with areas in which the use of fire is forbidden, that is, in protected areas (Indian reservations, integral protection and sustainable use) and low-access infra-structure (roads). Second, the substitution of fire for fertilizer and machinery, that is, agriculture intensification, has the potential of inducing fires. The current deforested areas match the mesoregions that have good infrastructural access. These regions must be technological investment targets for farming.

The construction of maps using these methods significantly helps revealing possible spatial relations between burning or deforestation sites and Indian territory and/or lumbering zones, for example. Maps acquired through different techniques show that the results are consistent with reality.