Introduction

The underlying causes of tropical deforestation often include road construction by logging, mining, or petroleum interests, all of which open previously inaccessible areas to migrant colonists (United Nations 2001; Geist and Lambin 2002). However, it is the clearing by migrant families to establish farms that is generally viewed as the primary proximate cause of deforestation (World Bank 1991; Geist and Lambin 2002), especially in forest frontiers and often in protected areas (Rudel and Roper 1996; Bilsborrow and Carr 2001). Recent studies in Latin America have shown that population growth is positively associated with agricultural expansion (e.g., Bilsborrow and Geores 1995; Bilsborrow and Carr 2001; Carr 2002; Perz and Walker 2002; Escobal and Aldana 2003; Barbier 2004); however, spatial and temporal discontinuities have continued to obfuscate population–environment (P–E) relationships. Global and regional-level evidence of positive links between population and deforestation would appear inconsistent with trends at the national level in many Latin American nations in recent decades—a time when rural populations have declined yet deforestation has continued, and in some instances accelerated (Carr et al. 2006a). This apparent discrepancy is likely explained by the difference in scale, as a small percentage of Latin America’s population, farmers migrating to the frontier, are directly linked to much of the region’s deforestation. This small population is also among the region’s poorest and most vulnerable. Studying their land use and land cover (LULC) decisions is therefore crucial for developing policies to ameliorate rural poverty as well as to conserve the planet’s remaining biologically rich rainforests.

Several studies have documented the influence of fertility and the household life cycle on forest cover along the frontier (e.g., Marquette 1998; Perz 2001; Geist and Lambin 2002; McCracken et al. 2002), although more recent research has raised serious questions about the strength of the household life cycle in contrast to the patterns of change over time associated with the age of the farm itself (Barbieri et al. 2005; VanWey et al. 2007). However, the potential attenuation of population factors on forest cover as frontier regions develop and the role of the geographic distribution of population and its economic activities remain important questions. While population size has been associated with land cover in various empirical studies (e.g., Turner et al. 1977; Pichón 1997; Geist and Lambin 2002), the effect of changes in population size and the effects of population density, distribution, and spatial heterogeneity on changes in land use over time in frontier regions have received little attention.

In this paper we examine the following hypotheses regarding the effects of population size, location, and timing (year) of farm establishment on the rate of forest clearing over time: First, increasing population size (and therefore density) on a farm increases the rate of deforestation. Boserup’s theory (Boserup 1965, 1981) that farmers react to population pressures by adopting more intensive farming techniques has been corroborated in some studies on developing regions ranging from semi-subsistence societies to market economies (see, e.g., Turner et al. 1977; Brush and Turner 1987; Pingali and Binswanger 1988; Bilsborrow and Geores 1995). However, due to the prevalence of tenuous land claims and the abundance of land relative to labor in frontier regions, we anticipate that increasing population density will instead tend to induce agricultural extensification (farmland expansion) as hypothesized by Malthus (1798) and as found in cross-sectional data for Latin America (Bilsborrow and Carr 2001; Carr et al. 2006a).

We propose two hypotheses related to the location and proximity of fincas relative to the larger towns/communities in the region. First, we anticipate that farms located farther from major urban centers in Ecuador’s Amazon will have both more forest cover and a lower rate of forest clearing over time than farms closer to urban centers. As explained by Von Thünen, land values tend to decrease with distance from the market, so the least intensive forms of land use are found farthest from markets (Von Thünen 1826/1966). The resulting “ring” model has received general support in studies in various settings and throughout Latin America. Prior research in the northern Ecuadorian Amazon, for example, revealed that farms further than 9 km from the nearest road had, on average, 79% forest cover compared to less than 50% cover for farms within 3 km from the road (Pichón 1997). We thus posit a second hypothesis related to the specific location of fincas: farms surrounding different urban centers will have differential rates of forest clearing, with higher rates on fincas located near older, more established communities (Lago Agrio and Coca) compared to fincas located near relatively newer communties (La Joya de los Sachas and Shushufindi). We refer to this latter hypothesis as the spatial heterogeneity hypothesis.

Lastly, we hypothesize that farms established for a longer period of time have less forest cover, but a lower rate of forest clearing than farms established more recently. This suggests a curvilinear relationship. Thus, forest clearing is necessarily high during the early years of settlement, both to grow subsistence crops and to lay claim to the land (Pichón 1997). As the farm ages, more land is cleared to plant perennial cash crops (mostly coffee) and ultimately for pasture to raise cattle. At the same time, children of migrant settler households age to adulthood and become part of the farm labor supply, facilitating further land clearing and accumulation of savings to plant more cash crops or purchase more cattle (Perz 2001; McCracken et al. 2002; Walker et al. 2002a).

We investigate the variables associated with these four hypotheses jointly to examine the relative strengths of their effects. All three processes tend to occur simultaneously over time: increasing farm population size (density), decreasing distance to roads and nearby towns as road networks expand, and a tendency to continuously clear more forest through ongoing household settlement on marginal land with lower soil fertility. Multivariate statistical models are explored to control for the possible effects of spatial clustering, in which we compare results from ordinary least squares (OLS), random effects (RE), and spatial econometric models. The following sections describe the study context and the data, followed by the statistical methods, results and implications for further research and policy.

Methods

Study Site

Ecuador is comprised of three ecological regions: the western coastal lowlands, the central highlands, and the eastern Amazon (Oriente). The study site lies in the northern half of the Oriente—the provinces of Sucumbios, Napo, and Orellana (Fig. 1). These Amazon lowlands, combined with those east of the Andes in Peru and southern Colombia, constitute one of the world’s most biodiverse wilderness areas (Myers et al. 2000; Mittermeier et al. 2003). The northern Ecuadorian Oriente has been undergoing rapid forest clearing since the discovery of oil in 1967 near the Colombian border. Road construction by oil companies during the late 1960s and early 1970s, as well as the completion of a major truck road between Quito (Ecuador’s capital) and Lago Agrio (the largest city in Ecuador’s Amazon region) in 1972, dramatically increased migration to the region. The resulting rapid process of colonization led to a population increase from around 50,000 (mostly indigenous) in 1960 to over 500,000 in the 2001 census. Over the last three intercensal periods, the population of Ecuador’s Amazon has grown at over double the national rates: 8% from 1974 to 1982; 6% from 1982 to 1990; and 3.5% from 1990 to 2001 (INEC 1992, 2001). While in-migration peaked during the late 1970s, data from our studies as well as from national surveys confirm than significant net in-migration continues. Fertility also remains high, with TFR in 2000 measured as 5.5 in the Oriente versus 3.4 for the country as a whole (CEPAR 2000). Recent demographic surveys in 2005 found that TFR declined in the Oriente to 4.2, but is still higher than the national rate of 3.3 (CEPAR 2005). Declining fertility in our sample has also been noted in recent work by Carr et al. (2006b).

Fig. 1
figure 1

Study region, sample settlement sectors, and primary reference communities

The settlement process has been primarily spontaneous, with migrants arriving on their own initiative without any financial or technical support. Early arrivals typically settled on plots along the roads, while later arrivals claimed land behind farms fronting the roads. To obtain land title, a group of settler families living in a specific area or sector would form a precooperativa and together apply to the government Land Reform and Colonization Agency (IERAC) to have their plot boundaries surveyed, purchase a temporary title (certificado de posesión), and then later a permanent legal title (escritura).

Farming characteristics of the study site differ somewhat from other Amazonian regions such as Brazil. Soil conditions are often better with pockets of highly fertile volcanic (black) and rich alluvial soils mixed with the more typical Amazonian “red” soils, characterized by high acidity and aluminum toxicity. In addition, the growing season is year-round, requiring “slash and mulch” clearing practices, with little burning. Although climate change was not a factor in agricultural production to the late 1990s, since then there is anecdotal evidence of microclimate changes (less rainfall) in some areas.

Sample Design

Our study is based upon survey data collected from migrant colonist farm plots. The original sample design was developed in 1990 and a follow-up survey was undertaken in 1999. Details of the sampling design are provided elsewhere (Pichón 1997; Bilsborrow et al. 2004). Briefly, a probability sample of 470 settler plots (finca madres or fincas) in 64 settlement sectors (or precoopertivas) was selected using a two-stage sampling procedure in 1990. The sample covered an estimated 5.9% of the 1990 rural population in the three-province study region and is representative of that population.

In 1990, two household surveys were administered, one for the economic head (usually male) and one for the spouse. The head’s questionnaire covered land acquisition and titling, land use and agricultural production, agricultural inputs and technology, livestock, use of hired labor, off-farm employment, and technical assistance/credit received. The spouse’s questionnaire included household composition, out-migration from the household, fertility and health care, household assets, and dwelling quality. Both respondents were asked about their own migration and work histories, aspirations for their children, and attitudes to life in the frontier. Response rates in 1990 were over 97%. The final sample in 1990 provides data on 418 farm households on 398 fincas.

A follow-up survey in 1999 targeted all households located on the original fincas, thus constituting a panel survey of land areas. The number of households per finca greatly increased in the interim due to subdivisions of the original fincas among heirs and sales of part of the finca to new in-migrants. In several settlement areas (or sectors), small parts of plots were also sold as house lots (solares), too small for significant agricultural activity and classified as urban land use. In 1999, a total of 934 heads of household were interviewed, comprising 823 farms (defined as separately managed plots of at least 0.5 ha.) and 111 solares on 403 fincas. Similar head and spouse questionnaires, which are available online (www.cpc.unc.edu/ecuador), were administered to create a fully longitudinal dataset. Response rates in 1999 for the head and spousal surveys were 93% and 97%, respectively.

To augment the household surveys, community and spatial data were collected in 1999 and 2000. Locations of all relevant community structures (markets, health care centers, community centers, schools, etc.) were georeferenced using global positioning system (GPS) receivers. In addition, the front two corner points of each finca and the location of each household was recorded. Primary and secondary roads were digitized and a time series of satellite images were processed from 1973 to the present (although satellite data are not used here). All spatial data were stored in ArcView 3.1 and have subsequently been transferred to ArcGIS (ESRI 1998, 2002).

The sample for this study was based upon complete finca responses; that is, data for which complete information was collected for all respondents living on a finca in 1990 and 1999. Land use, population, and wealth variables examined represent aggregate measures at the finca level. This aggregation is far more relevant to the 1999 data, as only a few fincas in 1990 had more than one farm household, while approximately 40% in 1999 had more than one subdivision. The final sample used here for the analysis is based on 361 fincas with complete data in both 1990 and 1999.

Variable Descriptions

The primary dependent variable of interest for this study is forest cover on the finca, which is derived from survey responses of the household head and defined as total hectares of primary forest plus secondary forest at least 7-years old. We focus on the change in forest cover on fincas from 1990 to 1999 (Eq. 1).

To capture context, fincas are linked (one-to-one) to each of the four major communities in the study region based upon responses to the 1999 spousal survey regarding where a household purchases goods, sends children to school or acquires medical care. Responses to these questions help to identify a reference community for each finca. Only four communities were considered due to their significantly larger populations, lengths of time since establishment, and available infrastructures compared to smaller “secondary” communities in the region. The four communities are: Lago Agrio, Francisco de Orellana (Coca), Shushufindi, and La Joya de los Sachas (Joya); see Fig. 1. If a finca identified a community other than one of these four as the reference community, the nearest of the four was assigned as the reference. Distances from fincas to each of the four communities (Table 1) were computed using Euclidean rather than network distances since the road network was not completely digitized (i.e., several small footpaths and dirt roads were not digitized). Road access and walking distance to the road were recorded from the head of household surveys.

Table 1 Description of variables hypothesized to influence forest cover change

All other variables were derived from the head and spouse household survey questionnaires. Finca population was aggregated from household rosters in the spousal survey and grouped into six age-sex categories (0–11, 12–49, and 50+-years old by gender). The year of finca establishment was determined based on the earliest year of migration to the finca reported by anyone living on the finca at the time of the 1990 survey. Natural resource variables (soil quality and topography) are measured as the proportion of the finca with good soil or flat topography, determined by taking aggregate responses for all farms located on a finca (i.e., if a 50 ha finca was subdivided into three farms, two of 20 ha with good soil and one of 10 ha with poor soil, finca soil quality would be defined as 0.8 since 80% of the finca has good soil). Off-farm employment (OFE) was measured as the number of person-months working off the farm in the previous 12 months. Tenure security was difficult to characterize as nearly all households held full title or a certificate of possession in 1990. This situation changed dramatically by 1999—the closure of the land titling agency, IERAC, in 1993 meant that obtaining land titles became far more difficult. This has not (yet) been a problem in cases in which fincas are subdivided among heirs as de facto titles are recognized and ownership rights clear. However, this is not the case when part of the plot was sold to a migrant. Therefore, we defined tenure security as a binary variable with “0” representing less secure land title (fincas where no secure titles were reported in 1999 for any farm on a finca, or where no farm on a finca in 1990 held full title and less than 50% of the finca area was under title in 1999) and “1” representing other situations involving more secure land title.

Finally, a number of other strictly exogenous explanatory variables entered into the models are listed and briefly defined in Table 1. Road access and walking distance to the nearest road are highly collinear—walking distance was omitted as it was measured from the location of the household to the road at the time of the survey, and could have changed over time due to a change in location of the house within the farm plot; however, road access and distance to the finca does not change unless a new road is built. Population variables also tended to be collinear. For example, the number of adult females on a finca was highly correlated with both the number of children and the number of adult males. This is logical because females rarely work or run a farm on their own (only 11 fincas in 1990 and 9 fincas in 1999 had no adult males compared to 33 in 1990 and 43 in 1999 with no adult females), and because children were almost always accompanied by women (only 10 fincas in 1990 and 11 fincas in 1999 contained children with no adult females). Therefore, only the following demographic variables were considered: adult male population in 1990 (a measure of labor supply), elderly population in 1990 (a proxy for household and finca life cycle), and changes in the observed population from 1990 to 1999 for adult females, adult males, and children. Household income in 1990 was strongly correlated with tenure security and OFE in 1990, which is reasonable considering that households with more tenure security were found more likely to invest in pasture, which requires less labor and thus frees a person’s time to engage in OFE (Pichón 1997; Murphy 2001; Pan et al. 2004). Land titles are also commonly used as collateral for loans for farm investment, usually to buy cattle and hire temporary labor to clear forest, collect harvest, and create pasture (Bilsborrow et al. 2004). Lastly, soil quality and topography appeared moderately correlated with land use, consistent with earlier work (e.g., Pichón 1997; Pan and Bilsborrow 2005); however, bivariate analyses (not shown) found topography to not be a good predictor of forest cover change so it was omitted.

Statistical Analysis

Trends in forest cover and population size were examined graphically over distances and across time, with statistical significance identified using Wald χ2- and F-tests from univariate regressions that control for clustering by (settlement) sector. The trend test performed controls for clustering but means rather than shrunken estimates are presented for ease of interpretation. Statistical significance is taken to be a p-value of less than 0.05.

To test the hypotheses for the pace of forest cover change, three regression models were considered: ordinary least squares (OLS), random effects (RE), and spatial econometrics. The response (dependent variable) for each of the three models is defined as the percent of forest cover (in hectares) in 1999 minus the percent of forest cover in 1990:

$$ y = \frac{{100*[{\text{Forest}}_{{1999}} - {\text{Forest}}_{{1990}} ]}} {{{\text{Finca}}\,\,{\text{size}}\,}}. $$
(1)

An OLS model was initially fit to run model diagnostics. Residual-predicted plots and the Shapiro–Wilk test were used to assess model assumptions; condition indices and eigenvalues identified collinear variables; outliers were detected with residual plots and influential statistics; Moran’s I identified both overall spatial dependence and violation of model assumptions (i.e., non-normal errors); and Lagrange multiplier tests identified spatial lag or spatial error. Likelihood ratio tests compared nested models, and Akaike’s Information Criterion (AIC) compared non-nested models.

After OLS model assessment, a RE model was fit using random sector intercepts to control for clustering within sectors and adjust for unobserved heterogeneity. Rubin (1976) also showed that including random intercepts help control for potential bias that might occur due to the clustered sample design. RE models are very flexible, commonly used in studies of longitudinal data (e.g., Laird and Ware 1982; Diggle et al. 1994; Pan et al. 2004), repeated measures (e.g., Diggle 1988; Vonesh and Chinchilli 1997), and to help control for spatial dependency and clustering within groups such as fincas within sectors (e.g., Stroup et al. 1994; Swinton 2002). The RE model is specified as:

$$ y_{{ij}} = {\mathbf{X}}_{{ij}} \varvec{\beta} + \delta _{i} + \varepsilon _{{ij}} , $$
(2)

where y ij refers to the change in forest cover for finca j in (settlement) sector i; X ij is a vector of finca-level covariates with corresponding parameter vector β; and δ i and ε ij are sector and finca residual errors, respectively, which are assumed to follow a multivariate normal distribution with mean 0 and variances \( \sigma ^{2}_{\delta } \) and σ2, respectively. The proportion of variance of y within sectors (an indicator of model fit) is computed as sector variance divided by total variance.

Lagrange multiplier tests performed from OLS model results identify whether spatial lag or spatial error exists in the model (for a description of these tests see Anselin et al. 1996). The spatial lag model is interpreted through a spatial filter, i.e., the response variable is dependent upon independent variables at all locations, but with distance decay (Anselin 1988). In contrast, the spatial error model adjusts for spatial autocorrelation in the error term and assumes that shocks to the system are local, i.e., changes in one location affect only locations that directly surround it. Both models are easily described using matrices. Thus, define y as the vector of responses for all fincas; W as the spatial weight matrix that quantifies the connectivity between fincas; ρ as the spatial lag autocorrelation coefficient, λ as the spatial error autocorrelation coefficient; X as the matrix of independent explanatory variables with corresponding parameter vector β; and random error vector ε. One further defines ξ for the spatial error model as an uncorrelated homoscedastic error term. The spatial lag model is then defined as:

$$ {\mathbf{y}} = \rho {\mathbf{Wy}} + {\mathbf{X}}\varvec{\beta } + \varvec{\upvarepsilon} $$
(3)
$$ {\mathbf{y}} = ({\mathbf{I}} - \rho {\mathbf{W}})^{{ - 1}} {\mathbf{X}}\varvec{\beta} + ({\mathbf{I}} - \rho {\mathbf{W}})^{{ - 1}} \varvec{\upvarepsilon}, $$
(4)

where Eq. 4 is derived directly from Eq. 3 and is written in terms of the spatial filter, (Iρ W)−1, on the right-hand side of the equation. (Iρ W)−1 can be expressed as \( ({\mathbf{I}} + \rho {\mathbf{W}} + \rho ^{2} {\mathbf{W}}^{2} + \rho ^{3} {\mathbf{W}}^{3} + ...), \) demonstrating that y is a function of first-, second- third-, etc., order neighbors.

A spatial error model is defined as:

$$ {\mathbf{y}} = {\mathbf{X}}\varvec{\beta} + \varvec{\upvarepsilon} $$
(5)
$$ \varvec{\varepsilon} = \lambda {\mathbf{W}}\varvec{\upvarepsilon} + \xi , $$
(6)

where is a spatially lagged error term that captures spatial spillover of errors between neighbors. To specify the spatial weight matrix, W, finca centroids were used to compute a distance matrix representing the distance separating each finca in the dataset. Inverse distance weights were computed for distance thresholds ranging from 500 to 10,000 m to determine at what range spatial dependency is maximized for the dependent variable. The threshold distance with the maximum Moran’s I was used to define W.

To test the hypotheses that change in population size, year of finca establishment, nearest major community, and distance separating the community and finca were significantly related to forest cover change, contrast matrices are used to define the appropriate inference space for each hypothesis (McLean et al. 1991) and an F-test is computed. Since RE were not tested, a χ2-test was used because it is mathematically equivalent to an F-test for these hypotheses. Specification of contrast matrices is described in a number of statistical texts (Searle 1971; Mendenhall 1968) and in programs such as SAS (2003). It begins with the specification of the hypothesis: H0: Θ = Θ 0 vs. HA: Θ ≠ Θ 0, which defines the General Linear Hypothesis (GLH): Θ = CB, where C is referred to as the contrast matrix and Θ 0 is most often set to 0. For example, suppose we wish to test whether distance to a major community influences forest cover change and we fit a model with an intercept plus seven variables: road access (yes/no), distance to each of the four communities, OFE, and tenure security (secure/insecure). These variables define the parameter vector:

$$ \varvec{\upbeta}^{{\text{T}}} = {\left[ {\begin{array}{*{20}c} {{\beta _{0} }} & {{\beta _{{{\text{road}}}} }} & {{\beta _{{{\text{Coca}}}} }} & {{\beta _{{{\text{Lago}}}} }} & {{\beta _{{{\text{Shushu}}}} }} & {{\beta _{{{\text{Joya}}}} }} & {{\beta _{{{\text{OFE}}}} }} & {{\beta _{{{\text{Tenure}}}} }} \\ \end{array} } \right]}, $$

where T refers to the transpose of the vector. Therefore, the contrast matrix for the hypothesis would be defined as \( {\mathbf{C}} = {\left[ {\begin{array}{*{20}c} {0} & {0} & {1} & {1} & {1} & {1} & {0} & {0} \\ \end{array} } \right]}, \) such that the GLH would test whether the sum of the distance parameters was significantly different from zero. Examination of individual distance parameters is not sufficient to properly test the hypotheses since we are examining the overall effect of distance to a community on forest cover change.

Descriptive statistics and OLS and RE models were all conducted using SAS 9.1 (SAS 2003), maps were created using ArcView 3.1 (ESRI 1998), and spatial statistics and models were run using GeoDa (Anselin et al. 2003) and confirmed in SpaceStat (Anselin 1995).

Results

In both 1990 and 1999, fincas closer to the four cities had less forest cover and a larger population compared to farther away (Fig. 2a). The change in forest cover does not vary substantially across distance categories, ranging from 6.8 to 15.3 percentage points. In contrast, the change (increase) in population was greatest for fincas located nearest to the four communities and decreased with distance. Finca population does not increase much between 1990 and 1999 on fincas more than 25 km from the major communities, clearly indicating that the process of subdivision in the 1990s occurred on fincas close to these communities. Examining population and forest cover together, one might hypothesize that change in population size has a mitigating effect on forest cover as a function of distance to a major community due to the observation that population differences converge, yet forest cover differences remain relatively stable across distance categories.

Fig. 2
figure 2

(a) Population and land use by proximity to nearest reference community (Lago Agrio, Joya, Shushufindi, and Coca). “Pop 1990” = Finca population in 1990; “Pop 1999” = Finca population in 1999; “Forest 1990” = Percent of forest cover on a finca in 1990; “Forest 1999” = Percent of forest cover on a finca in 1999. (b) Population, land use, and distance to a reference community by finca settlement cohort. “Pop 1990” = Finca population in 1990; “Pop 1999” = Finca population in 1999; “Forest 1990” = The percent of forest cover on a finca in 1990; “Forest 1999” = The percent of forest cover on a finca in 1999; “Nearest City (km)” = Euclidean distance to the nearest of the four reference communities

Forest cover and distance to the nearest major community had similar relationships with the finca establishment cohort (Fig. 2b)—fincas established earlier had less forest cover in 1990 and 1999 and were located closer to major communities compared to fincas established later. The difference in forest cover was larger for later (more recent) cohorts, indicating a faster rate of deforestation among recent arrivals. However, it is possible that the rate of forest clearing was also higher in the early years for those cohorts who arrived earlier—it is not possible from our data to measure the rate of forest clearing for earlier cohorts, nor is it possible to infer that recent cohorts will continue to deforest at faster rates in the future that older cohorts. The 1990 finca population varied little with year of establishment, but 1999 finca population was much higher on fincas established earlier, primarily because earlier established fincas tended to located on main roads, closer to the four main towns, and were thus more likely to be subdivided for incoming migrants or among heirs.

It is important to note that although Fig. 2b appears to show forest cover change to be dependent upon year of settlement, time in itself is not a causal factor, i.e., the year a finca was established does not cause farm households to increase or decrease population size over time or to clear more land. Rather, the year of finca establishment defines settlement cohorts (or duration cohorts described by Barbieri et al. 2005) and locations of the family and finca along the household and plot life cycles, respectively. But the year of establishment is also clearly linked to the location of the finca, which is in turn closely tied to the propensity of finca subdivision. Since this process of subdivision is a major determinant of population change, then the year of establishment may conceivably be tied to the increase in population and hence population density on fincas. However, it is not year or duration or cohort per se that is tied to population increase, but the locational attributes of the finca, which are determined by year of establishment.

To examine whether changes in forest cover and population are homogeneous for varying distances surrounding each of the four major communities, we examined finca characteristics in (up to) four distance categories for each community (Table 2). Due to the location of fincas in relation to the four major communities in the region (fincas south of Coca or west/north of Lago Agrio could only be linked to Coca or Lago), Lago Agrio and Coca were linked to 135 and 100 fincas, respectively, while Joya and Shushufindi were linked to 68 and 58, respectively. Table 2 shows that, overall, higher forest cover and smaller population sizes were both associated with greater distances from the nearest large community in both 1990 and 1999, and that there is clear heterogeneity in the relationships. This is also true of changes over time, that is, there is a tendency for fincas closer to the towns to experience more population growth (except near Lago) and more forest clearing (except near Coca).

Table 2 Spatial location, forest cover, population size, and changes over time

Considering the relationships for each town, for both Joya and Shushufindi, forest cover and population size on fincas generally varied in the expected fashion with distance in both years. Fincas under 10 km from Joya in 1990 reported a mean of 39% forest cover and a mean of 7.4 people compared to 48% forest cover and 8.6 people for fincas 10–19 km away, respectively. This suggests that distance may have a stronger influence than population on forest clearing in 1990; however, by 1999, population on fincas close to Joya nearly doubled with more than a 50% reduction in forest cover (39% to 17% forest) compared to only a slight increase in population and approximately 20% decline in forest cover for fincas further from Joya. Similarly for Shushufindi, in 1990, fincas under 10 km reported 53% forest cover compared to 59% and 83% for distances 10–19 km and 20–39 km away, respectively. Mean finca population varied as expected, falling with distance for both years. The differences for 1990 and 1999 between distance categories are highly statistically significant, which is also consistent with the differences for changes over time. Fincas surrounding Joya and Shushufindi had greater forest loss and larger increases in population size the closer they were to either of the two cities.

Around Coca, the closest fincas had significantly less forest cover in both 1990 and 1999, as well as slightly larger populations in 1990 and significantly larger populations in 1999. It is important to note that although Coca is the second largest city in the study region (with about 22,000 people in the 2001 census), the area south of Coca was only recently opened by oil roads (some in the 1980s), which explains why fincas near Coca had much more forest cover than those near the other three communities, particularly in 1999. Despite the cross-sectional relationships consistent with distance, deforestation on fincas tended to be greater with greater distance from Coca (though not statistically significant), although population increase was higher on fincas close to Coca.

As for Coca, Lago Agrio, the largest city in the region (census population of 34,000 in 2001), the expected associations with distance were observed for forest cover and population in both 1990 and 1999, but the relationships are weak and not consistent. No statistically significant relationship was observed between changes over time and distance from Lago for either forest cover or population on fincas.

The last panel of Table 2 provides data for all fincas on walking distance to the nearest road from the dwelling as reported by the head of the household in each year. As expected, greater distances were associated with significantly more forest cover and smaller population sizes in each year. However, the relationships, albeit in the same direction, were not statistically significant for the change in forest cover or change in population. The overall differences in forest cover surrounding communities and roads indicate a global nonstationary spatial forest clearing process across the study region, so rates of forest clearing are not as easily interpretable as implied by Fig. 2a and b.

The influence of population on forest cover was also examined by decomposing 1990 population by age (children 0–11, adults 12–49, and older adults 50+) and gender (Table 3). Across all age–sex categories and for both years, a larger number of people on a finca was associated with less forest cover. This relationship was significant for female children, adult males, all adults combined, and for male and female older adults. While we expect, and observe, that a greater number of adult males (at baseline) results in significantly less forest compared to a greater number of adult females for each year, the relationship among older adults and forest cover is more complex. A finca with no older adult males or females had a mean forest cover of 64 ha. in 1990 and 51 ha in 1999, compared to a finca with at least one older adult which had a mean forest cover of 55 and 44 ha., respectively. However, older adults (particularly males) are typically the original settlers of the finca, therefore, the relationship observed may reflect the settlement cohort and a manifestation of the finca life cycle. Alternatively, one could also interpret the relationship as reflecting a household life cycle effect (Perz 2001; Walker et al. 2002a, b). The high correlation between the age of the head and age of the finca makes it difficult to delineate these interpretations.

Table 3 Finca characteristics, forest cover, and changes over time decomposed by age and sex

Defining the Spatial Weight Matrix

To fit the spatial econometrics model, a spatial weight (or adjacency) matrix was defined to compute spatial diagnostics. After examining connectivity and resulting spatial correlations for threshold distances ranging from 500 to 10,000 m, a distance threshold of 500 m, which corresponds roughly to first-degree neighbors, if fincas were represented as polygons, provided maximum spatial correlation in the response based on Moran’s I. This weight matrix captures dependency primarily within settlement sectors.

Forest Cover Model Results

Multivariate model results investigating the relationships between changes in forest cover over time and key explanatory variables are presented in Table 4. Substantive findings are discussed below. The OLS (Model 1a) fit well with an AIC of 3184.3, an R-square of 0.14, residual variance of 382.4, and only minor deviation from model assumptions. Moran’s I was significant (I = 0.11, p = 0.009), indicating either spatial dependence or some violation of model assumptions; the Lagrange Multiplier test for spatial lag (LM-lag) was 3.32 (p = 0.068) and the Lagrange Multiplier test for spatial error (LM-err) was 4.24 (p = 0.039), indicating that spatial error is likely to exist. Accordingly, a spatial error model (Model 1b) was fit; however, it differs slightly from the other models as one observation was deleted since it was a spatial outlier—no neighboring farms existed within 12 km of the finca, while all other farms had at least one neighbor within 1.6 km. The spatial error model improved model fit over the OLS model, with an AIC of 3178.3, an R-square of 0.16, and a reduction in residual variance to 359.3. The spatial autocorrelation coefficient is estimated to be 0.18 (p < 0.001), indicating that the OLS estimates, while unbiased, are not efficient. This is reflected in comparing model results—the parameter estimates are similar, but the variance estimates and p-values associated with the parameters differ.

Table 4 Multivariate model results for the change in forest cover

Since spatial dependency is primarily within (settlement) sectors as indicated by the spatial weight matrix, a random sector intercept model (Model 1c) was also examined. The RE model fit the data better than either the OLS or the spatial error model (AIC = 3115.7), with sector-level variance accounting for 7% of the total variance of the change in forest cover as indicated by the intracluster correlation coefficient. As expected, regression coefficients were similar to OLS and spatial error estimates, while standard errors tended to be higher since the model adjusts more efficiently for correlated data.

We now discuss the substantive findings, focusing on those of the RE model and discussing variables in groups, as seen in Table 4. First, we observe that fincas with no road access had 5.7% less of a decline in forest cover from 1990 to 1999 than fincas with road access in both years (p = 0.06). Second, fincas established between 1985 and 1989 cleared 12.3% more of their land in forest between 1990 and 1999 than fincas established before 1975 (the reference group, p < 0.01), while fincas established between 1980 and 1984 cleared 8% more (p = 0.02), and those established between 1975 and 1979 cleared 6.4% more forest (p = 0.06). This indicates that fincas established more recently experienced more rapid deforestation, which is consistent with the descriptive results presented in Fig. 2b above.

Findings for population variables indicate that the addition of one adult male to the finca baseline male population results in a 1.6% decrease in forest cover (p = 0.03). In contrast, the addition of one elderly adult (50+) actually results in a 2% net gain in forest cover (p = 0.06), which may reflect the fact that such fincas with older persons are more likely to be not only older (statistically controlled) but to have had land in pasture and coffee for some years, some of which may have been left fallow to experience secondary forest regeneration. The expected demographic effects on forest cover are also seen in the effects of an increase in the adult male population from 1990 to 1999 by one, which results in a statistically significant net loss of 1.4% in forest cover on the finca (p = 0.02). Note that the effects of increases in children and women are in the expected direction but not significant. The effects for males reflect their greater role in land clearing and agricultural production in general in the frontier Amazon context.

Results for other variables in Table 4 may be summarized. First, none of the distance variables for distance to any of the four cities was significant. In addition, off-farm employment in 1990, change in OFE during the period, tenure security, and soil quality all had non-significant effects, though some small effects of better soils on being linked to more land clearing are observed.

Hypothesis tests were performed using results from the RE model (Table 5). The hypothesis that changes in population were related to changes in forest cover is found to be supported (p = 0.0007); the hypothesis that year of finca settlement influences rates of forest clearing was also significant (p = 0.018), but the hypothesis of differential forest clearing rates by the nearest reference community was not significant (p = 0.56). Finally, the hypothesis that distance from the finca to the reference community influences changes in forest cover was, surprisingly, found to be not significant (p = 0.46). This may be due to the fact that the distance effects are better captured by the road access and year of finca establishment variables.

Table 5 Hypothesis tests based upon the random effects model of forest cover change model

Secondary hypotheses also examined based on results from the RE model include the hypothesis that road access influences forest cover change, which was weakly supported (p = 0.17), and the hypothesis that the baseline population affects forest cover change (p = 0.04). The latter hypothesis indicates that more adult males and fewer elderly would result in greater decreases in forest cover over time.

Discussion

This paper examines demographic, spatial and temporal factors associated with changes in forest cover in Ecuador’s Northern Amazon. The main objective was to test four hypotheses related to forest cover change: (1) a larger increase in population over time on a farm (finca) leads to more deforestation; (2) rates of forest clearing surrounding four primary reference communities differ (spatial heterogeneity); (3) fincas farther from towns/communities experience lower rates of forest clearing over time; and (4) forest clearing differs by finca settlement cohort, viz., by year of establishment of the finca. The data used for the analysis are from a longitudinal household survey conducted in 1990 and 1999, which permits investigating changes over time based upon aggregating farms to the finca level, the level of the original sample plots and prior to subdivision. Community and spatial data augment the analysis by making it possible to identify and characterize the major reference communities or towns in the region, compute distances from fincas to those towns, and define spatial weight matrices. Our modeling strategy identified a random effects (RE) model as having the best fit to the data under the conditions that model assumptions are satisfied and correlation (spatial and nonspatial) does not influence model results. The RE model was fit with random sector intercepts that adjust for heterogeneity of forest clearing within settlement areas or sectors. Hypotheses (1) and (4) above received support from the RE model, while (2) and (3) did not.

The surprising absence of a significant distance effect has several possible explanations. First, only forest clearing is examined as a dependent variable, not the transition from forest to specific forms of land use (see, e.g., Pichón 1997; Pan and Bilsborrow 2005). While forest clearing for producing perennial cash crops may tend to cluster around towns with, say, coffee or African palm markets, this need not involve spatial clustering of forest clearing per se. Second, as mentioned above, the finca settlement cohort variable may capture much of the effect of distance, as fincas established earlier tend to be closer to major communities as well as major roads. Third, Euclidean distances may not adequately capture travel time, as there may be travel cost differences or geographic barriers separating fincas and communities. This is particularly problematic for fincas located in the central area between Shushufindi, Joya, and Lago Agrio, which have multiple potential travel routes to each town, as well as fincas south of Coca or west of Lago Agrio that realistically can be linked only to Coca and Lago, respectively. Fourth, the growth and influence of “secondary” communities over the study period, not included in this analysis, adds significant contextual effects that may well alter forest clearing patterns as well as the forest-to-agriculture transition. Fifth, the lack of a significant distance effect could be due partly to the small size of the towns (e.g., in contrast to Amazonian cities in Brazil as described by Browder and Godfrey 1997) and their associated lack of impact on finca activities. Finally, it could be due to the fairly dense system of roads in most of the study region, as most fincas were on roads or close to roads by 1999, reducing the time differences across fincas in access to towns and markets.

The findings related to population are especially important as they are among the first to definitively link forest cover change to population change at the farm plot level in a frontier region, in fully specified multivariate models that also control for spatial factors. Population growth at the farm level is normally a function of two demographic components: net migration and natural population growth, or the excess of fertility over mortality. For the northern Ecuadorian Amazon, in-migration rates have been extremely high, but they have declined in the 1980s compared to the 1970s and in the 1990s compared to the 1980s for a number of reasons, including slower population growth in areas of origin of migrants, an increase in emigration since 1997, the lack of major new oil discoveries in colonist areas, and the closing of the frontier (few new areas available for settlement due to the small size of the country, the creation of protected areas, and titling of large areas to indigenous populations). There is some out-migration from fincas as children reach adulthood, but this is small compared to continuing in-migration. Fertility rates of migrant colonist women have been falling over the past 15 years, but are still much higher than in the rest of the country, even other rural areas.

Therefore, the population of the region is continuing to grow, and has high population momentum built in due to its young age structure. Given the findings of this study, it is important for conservation policy to reduce this growth to slow the rate of forest clearing. This can be achieved by expanding the current very limited family planning (only one efficient private clinic exists, CEMOPLAF, in Lago Agrio) and health services in the region, combined with encouraging girls to attend school longer and to get more practical technical training so as to work and postpone pregnancy. It is also possible that improved services and/or economic opportunities (including land reform) in the major areas of origin of migrants to the Amazon (which, according to the 2001 census, are the provinces of Loja and Bolivar in the Andes), to reduce out-migration. More investment of the economic development funds of ECORAE, the government agency charged with promoting sustainable development in the Amazon, could contribute to urban employment expansion and alleviate pressures on rural areas. Such funds could support small industries that take advantage of Amazonian products, including the production of unique fruit preserves, canning palm hearts, making chocolate from cacao, developing specialized furniture production, and expanding ecotourism by adding luxury hotels and boats. By attracting migrants from outside the region who might otherwise purchase parcels from existing settlers and providing an outlet for children of settlers once they reach adulthood, urban business expansion would thus lead more migrants to settle in towns rather than continue subdividing fincas, which as we have seen, leads to further forest clearing. Finally, technical assistance should be expanded and restored to earlier levels to encourage colonists to use more intensive agricultural techniques on already deforested lands (including unused or abandoned lands) rather than clearing new areas (see Angelsen and Kaimowitz 2001).

The findings here also highlight the importance of finca settlement cohorts and duration of family residence on the pace of forest loss. There has been some previous support for the idea of cohorts influencing forest cover change (Moran et al. 2002; McCracken et al. 2002; Walker and Homma 1996), but these studies all focus on family life cycles linked to the age of the head. The study here, in contrast, shows that finca life cycles are as important as or more important than household life cycles. This makes theoretical sense, as there is normally a period of initial rapid clearing to produce subsistence crops and establish ownership, followed by additional episodes of clearing to plant cash crops and create pasture to raise cattle. It requires a major stretch of the imagination to envisage the gradual annual increases in the age of adults or children in the household, or even in their number, during the household life cycle as causing more than minor blips in the larger patterns of land use change associated with the finca life cycle.

Finally, our paper offers several important methodological contributions, including the use of longitudinal data; the collection of data for a scientific probability sample of settler plots; the integration of household, community, and spatial data; and the statistical approach. This paper constitutes one of the few longitudinal studies in a tropical frontier region that has an adequate sample size and characteristics to provide the statistical power needed to properly test hypotheses. This unfortunately has usually not been possible in existing studies of land use at the frontier. The RE model used here, also referred to as a multilevel, hierarchical or mixed model, has been gaining converts in the literature in analyses of land use (Hoshino 2001; Polsky and Easterling 2001; Swinton 2002; Pan et al. 2004; Pan and Bilsborrow 2005) thanks to two important properties of land use studies: (1) the RE model provides a framework in which multiple contexts (i.e., household, finca, and community) can be combined to simultaneously test their influence on LULC, as well as to examine the influence of interactions between levels on LULC; and (2) it provides an alternative to the more complex spatial models, which are also costly due to the effort required to collect spatial data. We show that even with complete data to fit a spatial model, the RE model provides a better fit since the within-cluster (sector) variance essentially controls for spatial autocorrelation, and clusters are sufficiently separate to deem them spatially independent.

Despite these strengths, it is important to recognize the limitations of this study. First, the time frame is too short to permit investigating the impact of important policy-relevant variables whose effects may take more time, including government development or credit policy, prices of commodities, deterioration in soil quality, micro-climate changes, agricultural extension services, etc. Second, the aggregation of data to the finca level precludes the ability to directly compare household and finca life cycle effects, since that can only be done using data for individual households and fincas (approximately 40% of the original 1990 fincas were subdivided into two or more farms by 1999, limiting the ability to make direct inferences about farm families). Lastly, and as noted previously, land use change is not a single, unique independent event—i.e., forests may be cleared for various uses; and land in one form of land use (e.g., coffee) may be switched to another form (such as rice or pasture). Therefore, independent studies of forest cover itself, or of land in pasture or perennials, are not as informative as those that encompass all transitions between different forms of land use, including secondary succession and forest regeneration. It is likely that forest clearing of households follows a trajectory related to time and household population size, but the reason why families clear land for particular uses at particular times or ages of household members is not explored here. This is important as it would shed light on specific spatial and temporal processes occurring in the major types of land use.

The findings here indicate the value of explicitly incorporating population and spatial factors in analyses of the drivers of deforestation and land use change in tropical frontiers. It is a great challenge for demographers and population scientists to use and develop new methodologies to understand the linkages between population and deforestation. This is a growing and specialized area of research in demography, referred to as population-environment research. This field is benefiting from not only new and novel methodologies and data collection methods, but also the integration of disciplines such as demography, geography, economics, anthropology, planning, forestry, agronomy, and public health. This makes it an exciting “frontier” of inquiry.