1 Introduction

Plant Community Assembly After Fire. The vegetation is the base in the functioning of the majority of terrestrial ecosystems as it captures the energy from sunlight and makes it available to the other elements of the ecosystem. Despite the vegetation being primarily dependent on the environment, as it grows complex, it modifies the environment to such a degree that it takes control of certain processes in the ecosystem.

However, plant communities are continuously changing along time following a patch pattern, where the death of plants creates gaps, so that the temporary increase of resources (light, space, water, nutrients, ...) promotes the growth of neighbouring plants and the recruitment of new ones until the majority of those resources are retained, or occupied, by plants or leaked out of the system [18]. These changes are usually slow; however, disturbances trigger large changes in plant community structure and functioning [15]. Disturbances often produce a large increase in the availability of resources through plant mortality [26], and fire is one of the most widespread disturbances [13]. The recovery of vegetation after fire depends on the regenerative strategies of the species [12, 26], that should be interpreted as a measure of the resistance and resilience of communities and ecosystems. Indeed, this measure has been used in this way by other authors, for instance in [25]. Nonetheless, the assembly of plant communities after fire depend on interactions among species, which have a primary importance but that, up to now, have barely been considered due to their complexity [16]. In any case the general trend of vegetation assembly after fire and the involved process have been outlined in some types of vegetation such as Atlantic ecosystems [2, 26].

Plant Communities and Cellular Automata Models: State of the Art. CA models incorporate both spatial and temporal dynamics [1, 7], making them suitable tools to model space-oriented ecological processes [9, 10, 17]. The plasticity of CA models has encouraged researchers tackling new challenges in ecology and their application has increased during the last decades [8, 10, 29]. They have been used for methodological purposes [7, 19, 21], for modelling vegetation dynamics [3, 5, 9, 10, 14] and the impact of disturbances [1]. Despite their strong dependence on parameterization, the main advantage of such models is that they are less laborious and they can be used for simulating complex systems with only a few rules. However, the sampling effort and computation requirements have prevented CA becoming an ordinary tool in ecological research. In this regard, CA models are not usually intended to reproduce the spatio-temporal patterns of vegetation; they are just a loose approach to the structure of vegetation, for instance [7], or to any process.

Objectives. The objective of this work is the development of CA models that reproduce the assembly of plant communities after fire and shortly discuss a possible way of optimizing their parameterization.

2 Background Data

The information for the cellular automaton has been recorded by the Fire Ecology Group of the University of Santiago de Compostela in a high number of locations in the north-west of the Iberian Peninsula for the last 18 years. Some of those data have been previously published in scientific journals, for instance [4, 16, 23, 24, 26,27,28]. Other data still remain unpublished.

The burnt areas studied cover a wide range of conditions. The main environmental sources of variation in our database are topography and climate, which ranges from Atlantic to transition climates to Mediterranean. The information used to build the model covers a broad scope of biological processes along the biological cycle of plants, from seed production and dispersal, plant regeneration strategies after fire, plant structure and vegetation structure and assembly. However, the largest set of information and the main input in the model is species cover, recorded in burnt shrublands during the first years after fire.

3 The Cellular Automaton Model

The probabilistic CA herein developed is defined by the tuple \(\langle L, H, Q, f, I \rangle \) where L is the lattice structure of the CA, H is the neighbourhood, Q the set of states, \(f:Q \times Q^{|H|} \rightarrow Q\) the local rule, and I is the initial configuration of the CA. Notice that, differently from traditional probabilistic CA, where the probabilities of the possible transitions are constant with time, in this case they can change to better reflect the empirical observations. We are now going to define in detail each one of this components of the CA model developed.

Lattice L. The post-fire recovery of vegetation is simulated in a bidimensional square lattice intended to reproduce a 30 m \(\times \) 30 m field plot, so that each cell represents a 0.1 m \(\times \) 0.1 m square. Thus, the lattice is defined as \(L = \lbrace (i,j) :1 \le i \le N, 1 \le j \le M \rbrace \) where \(N=300 \times M=300\). The sizes of lattice and of the cells were chosen according to field studies and computation requirements, because the probability of finding new species is directly related to the size of the plot [11, 30] and the cell size determines the relationships that can be detected among species [16]. However, a high number of cells increases the number of computations needed to simulate the whole model.

Neighbourhood H. The growth of plants across the plot was implemented in the CA model through the transition functions which use the Moore neighbourhood of radius 1. We assumed that the cells in the neighbourhood are not equidistant from the central cell. Namely, the cells reachable via a diagonal step are farther away from the central cell than the other ones. Since this distance has an influence in the real world, we have considered it when implementing the model.

Cell Values Q. The CA was designed to reproduce the dynamics of aboveground vegetation; accordingly, belowground characteristics are part of the initial configuration of the model and cell values are only concerned by changes aboveground. In the following, we will use the notation \(Q_{i,j,t}\) to denote the state of the cell in position (ij) at time t.

Each plant in the CA model, no matter the species or the way it was recruited, needs to be tracked through the entire simulation in order to display its spreading and interactions with other plants. In particular, every plant in the CA model has its own ID. In particular, for each \(1 \le v \le V\) where V the maximum number of species, the set of possible plants is defined as follows:

$$\begin{aligned}&Sp_{v,rs} = \lbrace Sp_{v,rs,i} \rbrace ^{Z^{rs}_v}_{i=1}&Sp_{v,sd} = \lbrace Sp_{v,sd,i} \rbrace ^{Z^{sd}_v}_{j=1} \\&{\textsf {Community}} = \bigcup _{v = 1}^V \left( Sp_{v,rs} \cup Sp_{v,sd}\right) \end{aligned}$$

where \(Z^{rs}_v\) is the number of plants recruited by resprouting and \(Z^{sd}_v\) is the number of plants recruited by seed germination; these two values depend on the particular species v under consideration. Community is the set that includes all plants in the CA, recruited by resprouting (\(Sp_{v,rs}\)) of by seed germination (\(Sp_{v,sd}\)). Accordingly, the state of a cell is either Bare ground, which means that is empty of vegetation, or the ID(s) of the plant(s) that occupy the cell. This means that \(Q_{i,j,t} \subset \text {Community}\), where \(Q_{i,j,t} = \varnothing \) designates Bare ground. With this representation the state of each cell can represent the presence of zero, one, or more than one plant in the physical space that the cell denotes.

Initial Configuration I. Initially \(Q_{i,j,0}=\varnothing \) for \((i,j) \in L\). The recovery of vegetation strongly depends on the pre-fire situation and fire damages, and thus the statements which govern the initial configuration of the CA were carefully conceived.

  1. 1.

    The pre-fire plant community of v species with cover \(cv_v\), randomly picked up from field data, is the target community assuming auto-succession.

  2. 2.

    The pre-fire plants are randomly placed in the plot according to the cover of each species.

  3. 3.

    The plot is environmentally homogeneous and empty of aboveground vegetation immediately after fire.

  4. 4.

    A proportion of plants of each species survives and are recruited following a temporal distribution obtained via field recruitment data. Post-fire resprouting mortality is not considered.

  5. 5.

    A number of plants of each species are recruited by seed germination following a temporal distribution that follows field recruitment data. The distribution of seeds is randomly uniform across the plot before fire. The number of seeds is not a limiting factor, and post-fire seedling mortality is not considered.

Thus, for all \(1 \le v \le V\), the spatio-temporal location of new plants

$$\begin{aligned}&S^{sd} = \lbrace x_\ell , y_\ell , T_\ell \rbrace _{\ell = 1}^{Sp_{v,sd}}&S^{rs} = \lbrace x_\ell , y_\ell , T_\ell \rbrace _{\ell = 1}^{Sp_{v,rs}} \end{aligned}$$

follows the following distribution:

$$\begin{aligned}&S^{sd}, S^{rs} \sim (U(1,N), U(1,M), f_1(t)) \end{aligned}$$

where \(S^{sd}\) is the total amount of seedlings in the community, \(S^{rs}\) the set of resprouted plants, and \(f_1(t)\) probability distribution of plant recruitment along time, taken from field data (Fig. 1). That is, a new plant is placed in the CA in a spatial position selected uniformly at random at a time determined by function \(f_1\).

Fig. 1.
figure 1

Temporal distribution (\(f_1(t)\)) of recruitment events for seedling (red) and resprout (green) at the top and the their distribution across the lattice and time at the bottom. (Color figure online)

Rules of the Automaton. The transition rules to update the CA model in the context of the Moore neighbourhood are as follow:

  1. 1.

    A plant j of a species v spreads to neighbouring cells at time t with different probabilities depending on its origin: with probability \(p_{v,rs,i}\) for resprouted plants (\(Sp_{v,rs,i}\)) and probability \(p_{v,sd,j}\) for plants (\(Sp_{v,sd,j}\)) recruited by seed germination.

  2. 2.

    Any cell that is occupied by a plant j remains occupied by that plant till the end of the simulation. This means that mortality and pruning are not considered in the model.

  3. 3.

    The probabilities \(p_{v,rs,i}\) and \(p_{v,sd,j}\) depend on the age of the plant, the biological type and the way the plant was recruited after fire. Since the simulations run in a square lattice using the Moore neighbourhood, the distance from the central cell of the neighbourhood was also taken into account as a correction factor. Thus,

    $$\begin{aligned}&Q_{i,j,t+1} = F\left( Q_{i,j,t}, W_t, Q_{i,j,t}^{|H|}, S_t^{sd}, S_t^{rs} \right)&\text {for } (i,j) \in L \text { and } t \in \mathbb {N}\end{aligned}$$

    where \(W_t\) is the matrix containing the relationships and transition probabilities of elements in \(Q_{i,j,t}\) and \(Q^{|H|}_{i,j,t}\) at time t and the growth of plants through time follow the functions \(d^{Sp_{v,rs,i}} = f_2(t)^{Sp_{v,rs,i}}\) and \(d^{Sp_{v,rs,j}} = f_2(t)^{Sp_{v,rs,j}}\), where the family of functions \(f_2^{Sp_{v,rs,i}}\) and \(f_2^{Sp_{v,rs,j}}\) provide a time-dependant value obtained via field data.

  4. 4.

    Any cell occupied by a plant A can be occupied by another plant B in the neighbourhood with probability \(p_B\) if \(t > 36\) and \(B_B > B_A\) and with probability \(\beta p_B\) otherwise, where t is a time span (years after fire), \(B_A\) is the biological type of plant A and \(B_B\) the type of plant B, and \(\beta \) is a correction factor.

Fig. 2.
figure 2

Mean plant diameter of species and regenerative traits in the CA along time, fitted from field data. Species were coded with different colours; solid lines indicate plant growth of resprouts and dotted lines the growth of plants recruited by seed germination. (Color figure online)

Fig. 3.
figure 3

Outputs of a random simulation at different time steps during the first 5 years after fire. Colours represent different cell states, that is plants or combination of plants. The background colour represents bare soil. (Color figure online)

Parametrization. The whole CA was parameterized by measuring the error with respect to field data values. The growth along time of each species and the regenerative trait in the CA were parameterized using a sigmoid distribution with the aim that one loop in the CA equals one month (Fig. 2); then, the whole community was simulated.

The cellular automata model can potentially reproduce the post-fire dynamics of any plant community because it gathers the main ecological processes in the post-fire recovery; it only requires some information about the species in that community. However, the availability of data limited the scope of plant communities to be modelled, being heathlands, broomlands and gorselands the best represented communities. The average number of woody species in those communities was relatively high (\(\overline{x} = 10.6\), \({\sigma ^{\overline{x}}}=0.4\)) and the majority of the woody species involved (33 out of 37) are able to regenerate through resprouting and seed germination. Thus, about 20 parameters (one for each species and regenerative trait) would be required in an average simulation, if independent growth among species was considered. Nevertheless, overlayering among species in nature is common and the competence among species usually decreases the rate of spreading of plants, indicating a non-independent growth and occurrence of species. Within this new context, having just two species coexisting in a single cell would already increase the mean number of parameters up to \(2'^2 = 400\). Even though the number of species in a 0.1 m \(\times \) 0.1 m cell is usually low in nature, it has been reported to be greater than 5 in some cases. As a result, a highly complex parameterization should be used in order to fit real data. In order to reach a compromise among the number of parameters, data, and computation requirements, instead of parameterizing all the interactions among species, we decided to parameterize each species and regenerative trait in isolation and to use a one-off correction factor to fit the spreading rate in presence of any other species, as indicated in the rules of the automaton.

The squared error of the overall cover of woody species in the pre-fire community and the post-fire communities was used to validate the model because the model assumes autosuccession. We have chosen field data around 3 years after fire to validate the community model because it is a critical period in the post-fire recovery and has a high impact on the overall recovery of the vegetation. Afterwards, the increase in cover of woody species tends to decrease and changes tend to occur slowly. Furthermore, it is a suitable subset of data for validation since a high percentage of field data focuses on the development of plant communities around the first three years after fire.

4 Results and Discussion

The average value of pre-fire community around 3 years after fire in the validation subset was \(88.1\, \pm \, 2.7\) \((mean\, \pm \, SEM)\) and the average cost of the simulations, i.e., the squared error, was 5.16. The error of validation simulations was relatively low having into account the large variation of field data [4, 16, 23, 24], particularly, around 3 years after fire.

From an ecological point of view, the CA based model matches the objective of reproducing the main patterns of the plant community assembly after fire. There are strong differences in the occupation of available space among species and biological traits. In this regard, the growth of plants along time changes in the same way as field data do. As a result, CA models can be very useful for hypothesis testing and for exploring different scenarios, but it reproduces an idealized and oversimplified community, not in terms of the number of elements (plants and populations) but in terms of their interactions. Despite the high quality of data, the huge variation of ecosystems drives to the impossibility of sampling all the possible situations, resulting in missing information. Thus, some factors and processes have been simplified in order to get a good compromise among model performance and computing requirements. One useful performance for the model has been finding a good correspondence between plant growth and growth probability for each loop in the CA. In our model one month equals one loop, which makes it worthy in terms of computing resources requirements and ecological interpretation of the results. A relatively low number of loops is recommended due to the high number of parameters and the extent and number of cells in the lattice, which is predefined in this work. The size of plot and cells successfully fit our purpose of reproducing the vegetation recovery. Too small plots would produce results that are due to the specific vegetation patterns [30], not to ecological processes; instead, large cells would not reproduce plant competition for resources, following other studies [16], and would result in unreal morphologies. Hence, the spatial scale have a crucial role in the interpretation of interactions among the plants, and species, in the community [16]. Furthermore, the spatial structure cannot be neglected when an analysis of their sensitivity with respect to their inputs and parameters is performed [6]. The number of processes, parameters and data required by the model would increase exponentially, when considering the influence of other processes or even the environment, which is often the hidden force modulating biological processes and interactions, and has multiple feedbacks with the biological component.

4.1 Proposal for Parameter Optimization

As it is possible to observe, there are multiple parameters that are necessary for the model to provide a realistic simulation of real world phenomena. In particular, the functions that regulates the rates of spreading of the plants are an essential part of the model and should be estimated accurately. While field data provide some values for those functions, it is necessary to provide them for all possible input values (i.e., time, in this case).

Machine Learning Methods. Genetic Algorithms (GA) [22] are a well-known nature-inspired optimization method where a collection (called population) of solutions (called individuals) to an optimization problems is represented as fixed length vectors of bits. An initial random population is iteratively evolved using operators inspired by the Darwinian theory of evolution: first of all, a subsection of the population is selected via a selection process that mimics natural selection, where better solutions have better survival probabilities. This sampled individuals are then combined via the operations of crossover, which mimics reproduction, and mutation that, similarly to natural occurring mutations in DNA, changes bits in the individuals. This process is repeated until one of the termination criteria is met, for example once a good enough solution has been found.

Genetic Programming (GP) was introduced by Koza [20] as a mean to evolve not only arrays of bits, as in traditional GA, but entire programs. In GP a program is usually represented by a tree, by its parsing tree. As in GA, a population of is evolved by mean of selection, crossover, and mutation, where the last two operators, depending on the actual representation used, are specific to GP.

Parameter Optimization Architecture. To perform the parameter optimization process, a two-level method has been devised. Initially, for each species GP is employed to provide a function estimating the rate of spreading in isolation. That is, for each species we are estimating functions that provide a realistic spreading rate when no competition is present. While this is not a sufficient condition to obtain realistic solutions when other species are present, it is, nevertheless, a necessary condition. This first step is performed to limit the computational costs: the evaluation of the solutions can be performed by running a smaller and simpler simulation (since only one species is involved).

Once a large enough number m of solutions has been obtained for each species, we consider the following matrix:

$$\begin{aligned} \begin{bmatrix} f_{1,1}&f_{1,2}&\cdots&f_{1,V} \\ f_{2,1}&f_{2,2}&\cdots&f_{2,V} \\ \vdots&\vdots&\vdots \\ f_{m,1}&f_{m,2}&\cdots&f_{m,V} \\ \end{bmatrix} \end{aligned}$$

where the i-th column represents the collection of m solutions for the i-th species that were found in the previous step. Now, it is possible to select via GA an element for each column to provide a solution to the problem of optimizing the spreading rates of the different species. For example, for 3 species the vector (1, 3, 3) will represent the three functions \(f_{1,1},f_{3,2},f_{3,3}\), one for each species. This second phase does not require to re-compute the spreading rates of the different species in isolation, but only to find a subset of them that produces a realistic simulation when they are combined. This two phase process should help reduce the computational burden of finding the correct parameters.

Field data will be separated into training, testing, and validation sets in the proportions 70%, 20% and 10% to deal with data dependence. Plant level data, particularly plant dimensions along time, will be used in the first phase to fit species spreading, while species abundance (cover data) will be used in the second phase. Furthermore, the cover of each woody species and their combined occurrence will be used to compute the cost of the parameters for the simulations unlike the current model, which only uses overall cover.

5 Conclusions

Ecosystems are highly complex systems that can be successfully simulated using cellular automata models. However there are two limiting factors: the availability of information about biological processes and the optimization of a high number of parameters. The balance between both of them (sampling effort and computational requirements) has to be met in order to make CA valuable for ecological research.

In the future we plan to apply the proposed two-level optimization procedure to correctly set the parameters. We think that this procedure can be generalised to other kinds of CA models where there are multiple distinct processes interacting in complex ways.