1 Introduction

The data stored in an information system usually portraits the world as it was at a specific moment or interval in time. This is especially true for spatial data, but with the added certainty that it will also evolve with time. The patterns of land use and land cover, of social, economic, and demographic variables in general, change constantly with time. Entire organisations have developed with the sole purpose of collecting and updating spatial data, through several data acquisition techniques [20]. Nevertheless, regular data collection provides at best a periodic picture of the changing reality, which in some applications may not be enough [2]. Stakeholders of an information system may need not only to know how the data changed in the past; in order to plan ahead or otherwise reason upon the data, they also need to understand why it changed the way it did and how it may continue to evolve in the future.

This need is met by specific tools and methodologies composing a niche of spatial analysis called Spatial Simulation [6]. Such tools allow the simulation of space-time changes of spatially distributed variables, usually on a discrete representation of space. Their use can improve knowledge on geographic phenomena in two ways [2]: (1) disclose the dynamics behind changes observed in the past and (2) forecast future evolution and change. The dynamics identified is typically applied to a set of data during a time period of interest [21].

Every since the inception of the first spatial simulation code library in the 1990s, the field experienced rapid growth with an increasing number of tools, such as Swarm, RePast or StarLogo, today comprising perhaps more than one hundred [6]. However, the spatial analyst is thus faced with a non trivial choice of tools, often requiring solid programming skills, at times facing data interoperability issues.

This article presents a Domain Specific Language (DSL) for spatial simulation in the context of GIS, named “Domain Specific Language for Spatial Simulation Scenarios” (DSL3S). Additionally, an accompanying development framework is introduced, that allows the development of models through the arrangement of graphical elements and their relationships, dispensing formal programming knowledge. These graphical models can then be translated into ready to run simulations through the application of code generation techniques [35]. The Information Systems Group of INESC-IDFootnote 1 research centre associated with the Instituto Superior Técnico of the Universidade de Lisboa, has now close to a decade of experience in this field, applying Model-Driven Development (MDD) and Language Engineering techniques in difference contexts, particularly through the ProjectIT and MDDLingo initiatives [13, 32, 34, 38, 39].

Section 2 of this article reflects on the need for DSLs in Spatial Simulation; Section 3 reviews and compares previous DSLs in the field. Section 4 describes the proposed approach and Section 5 outlines the syntax and structural semantics of the language. Section 6 details the prototype implementation of this language and the technologies supporting it. Section 7 presents three application examples of DSL3S. Finally, Section 8 summarises the article and discusses future work.

2 Background on spatial simulation

Cellular Automata is the oldest technique used in Spatial Simulation [45] in which the world is discretised in a grid of regular cells evolving in accordance to a fixed set of rules. More recently, Agent-based Modelling has become a popular paradigm, with wide application in the GIS context [2]. An agent can be defined as an autonomous object that perceives and reacts to its environment [12, 37, 43], a concept that stems from object oriented programming. Agent-based Modelling and Cellular Automata are two techniques that superimpose to some extent in the GIS context, though the former brought new processing possibilities, with geographic entities not only reacting to stimuli but also storing knowledge and reasoning before acting. Agents can also be used to model phenomena that do not have direct geographic meaning, such as social or economic interactions.

With at least two decades of history in the GIS field, Spatial Simulation has been in good measured barred from regular spatial analysts. To choose an appropriate tool, the analyst faces in first place the option between two categories: those tools providing support at Program-level—closer to a programming language approach—and those that operate at Model-level—closer to the conceptual level. The first commonly comprise source code libraries usable with high level programming languages. The second consists in pre-programmed tools that only allow users the parametrisation of the model. These two categories can be seen as the ends of a spectrum of support level defined by the multiple tools available in this field. [11]. At the middle of this spectrum are found DSLs that try to find a balance between the levels of support.

Program-level tools, such as Swarm [19], MASON [23] or REPast [28], can reduce some of the burdening of directly using a general purpose programming language, but still require good programming skills from the analyst [41]. Most of these code libraries are based on object oriented languages, such as Objective-C or Java, by themselves not easily accessible to entry level programmers. The full knowledge of one of these code libraries is something achievable only with several months of practice [33]. There is thus a relevant time lag between the option for one of these tools and a first simulation prototype, which for some projects may not be acceptable.

On the other hand, Model-level support tools (e.g. SLEUTH [5], TELSA [24], LANDIS [25]) tend to be quite specific, much of the model behaviour and assumptions are hidden in the software and may not be explicit or modifiable; their use in other application fields is largely impossible. The analyst can in fact dispense programming skills using this kind of tools, but is constrained to a specific field (e.g. Hydrography, Forest Management) and overall simulation behaviour. These tools also tend to narrow the interaction with geo-referenced data, by imposing certain formats or in some cases by lacking output functionality. Moreover, Model-level tools tend to impose dependencies on third party software that may not be trivial to overcome. Evolution or generalisation of these tools can sometimes become too expensive and fate them to extinction. Traditionally, they take advantage of market niches providing the needs of a specific and restricted group of users, thus the commercial nature of many of them.

In this context, three essential problems and difficulties arise as the motivation for the DSLs in this field:

  1. 1.

    most spatial simulation tools require specialised programming training;

  2. 2.

    those tools that do not require such knowledge are narrow scoped and tend to compromise GIS interoperability; and

  3. 3.

    an integrated approach to the description, documentation and communication of agent-based models is largely lacking [27].

Analysts working with spatial data either come from GIS related areas, like Geography, Cartography or Geodesy, or from the scientific domains of application, such as Biology, Economics or Environmental Science. Even higher education programmes on these fields largely lack programming training, particularly on object oriented development. GIS analysts thus generally lack the knowledge and practice of trained programmers, being unable to use the most common Spatial Simulation tools. The involvement of programmers in Spatial Simulation projects becomes indispensable, creating a further communication step between a model concept and its implementation.

On the other hand, the option for pre-compiled Model-level tools also imposes its dose of burdens. First of all the correct implementation of this sort of models is often hard or impossible to verify, since most are commercial, or otherwise closed source tools. Experiments with different behaviours or the input of alternative spatial information is impossible, which sometimes leads analysis to conform to the model, where the opposite would be the desired approach.

Lastly, regarding model descriptions, if a model can only be described by the source code that implements it, then it becomes unreadable to most GIS analysts, as per the above. Beyond that, source code specificities, such as data input/output, syntactic structure and programming paradigm, cast a layer of obfuscation that makes it hard to compare different models. There are numerous concepts common to any spatial simulation, such as the succession of time, spatial variables, agents, behaviours or spatial location. For example, a wildfire model can appear entirely different from a land use model simply because different tools were used to implement each concept, even though the basic programming constructs that compose each of them can be the same. Without some sort of common descriptive lexicon, models are harder to compare and communicate, even those produced for the same application domain.

Potential exists for a wider adoption of Spatial Simulation techniques, provided tools that make model development more accessible to non-programmers, together with a common lexicon for their description.

3 Related work

There have been several attempts to create DSLs for Spatial Simulation, trying to bridge the gap between Program-level and Model-level tools. In this section some of these DSLs are briefly described.

StarLogo started as a specialisation of the Logo functional programming language, directed at Agent-based simulations. It was an educational project at the MIT to help students exploring emergent behaviour. StarLogo was progressively transformed into a multi-platform tool with the adoption of Java as execution environment; eventually it evolved into a spin-off named NetLogo. The lexicon of NetLogo is composed of four main concepts, all different kinds of agents: (i) turtle - agent capable of moving across the simulation space; (ii) patch - a static subdivision of the simulation space; (iii) link - a relation between two turtles; (iv) observer - a non-spatial agent capable of collecting data from, and provide data to, other agents. Agents can themselves contain variables to store data and can be grouped in agentsets. A vast library of over 300 pre-built models has been gathered for education purposes,Footnote 2 covering a wide range of disciplines. Both StarLogo and NetLogo are relatively easy to learn, especially when compared to Program-level tools, dispensing the higher skills needed to use an object oriented language [31]. An integrated text editor supports swift development and the exploration of model dynamics. More recently an extensionFootnote 3 for spatial data input was made available, although entirely reliant on ESRI data formats. Berryman [3] reports that this extension requires advanced programming skills to master. Of the various DSL attempted in this field, NetLogo seems to be the most popular, retaining a large number of users. In great measure this is due to its fast prototyping capabilities, to which the integrated text editor greatly contributes. However, readability issues common to traditional programming languages slowly emerge with larger and more complex models, especially if spatial data is involved.

The Spatially Explicit Landscape Event Simulator (SELES) is the product of a research project at the Simon Fraser University, a declarative DSL for Landscape Dynamics [11]. SELES was conceived to be used closely with GIS software, supporting a vast range of different raster formats (most common in Land Use / Land Cover data) for landscape data input. SELES takes also as input a set of global variables and the declaration of several landscape events and agents. Landscape events describe the model dynamics, each requiring the declaration of a spatial domain and recurrence frequency. For each event a spreading mechanism is specified and how it affects its neighbourhood. Even though using keywords closer to the context of simulation, simulations coded with SELES are somewhat reminiscent of third generation languages, with distinct data and procedure environments, still leaving many usual coding activities to the user. It is a good example of a DSL that while dealing away with some of the complexity of traditional programming languages, achieves little in terms of abstraction. SELES is shipped with a dedicated code editor and a simulator that runs the model by interpreting the code files and reading in the spatial data. At run time the simulator displays the model in a graphical interface. Both these programs are available free of charge as closed executables for Microsoft operating systems.

MOBIDYC (Modelling Based on Individuals for the Dynamics of Communities) is an Agent-based approach to the study of population dynamics, directed at the fields of Biology and Ecology [17]. It was conceived to provide a tool accessible to non-programmers, particularly biologists. In essence, MOBIDYC is a Smalltalk code package, defining a set of simple primitives, such as environment, agent and state, plus a set of pre-defined behaviours. A model requires in first place the creation of agents and their respective states; behaviours are coded with primitive relations between the names of state variables, such as arithmetic operations. Observing agents to collect data can be added, but results are made available only in tabular format. Models developed with MOBiDYC can be quite fluid and easy to understand, if targeting Biology related problems; in other domains the semantics of the code can become harder to grasp. There is no explicit mechanism to interact with GIS software, MOBIDYC was conceived to run primarily on purely artificial spaces. The source code is open and free, but is dependent on VisualWorks, a commercial IDE. The reliance on this IDE provides wide portability to MOBIDYC, running on Microsoft, Macintosh and Linux operating systems.

Ocelet is a declarative DSL for landscape dynamics aimed at tackling common difficulties in capturing space-time dynamics with traditional modelling techniques [8]. It takes an unconventional approach to this field by mimicking the concept of service-oriented architecture, with models composed by components interacting with each other through services. To declare a model with Ocelet, the developer disposes of five principal constructs: (i) entity - a component that provides a set of services; (ii) service - communication port of an entity, accepting a set of arguments and returning a set of results; (iii) relation - bonding entities through their services (when compatible); (iv) scenario - describing which relations within an entity have to be activated, and when; (v) datafacer - a device through which entities access data. Entity behaviour is coded as actions behind each service through mathematical expressions. The double paradigm of this language presents a novel approach to spatial simulation, but it is not entirely clear if it eases model understanding. Users lacking a background on computer science may find the service-oriented architecture alien and hard to frame with spatial simulation. On the other hand the service-oriented paradigm provides a level of abstraction over the general purpose of a model that is lacking in the other languages reviewed here. However, as the amount of code required to describe larger models expands, this abstraction slowly dilutes. The language is supported by two Eclipse plug-ins: a language editor and a code generator. The artefacts generated are Java classes that can be compiled to specific operating systems or platforms.

In recent years a consortium of French and Vietnamese research centres and universities has developed an agent based modelling IDE called GAMA [18]. It is conceived to support large models and to provide seamless integration with spatial data. This IDE interprets a textual DSL called GAML (GAMA Modelling Language). A model in GAML is declared in similar fashion to SELES: a structured file composed by sequences of statements that can either be declarative of imperative. The core concept of this language is Species, essentially an agent class, that underpins most other concepts. A GAML model is structured into four code sections: (i) Header - setting the model name and optionally importing other model files; (ii) Global species - declaring a special species called ”world agent” enclosing global properties of the model; (iii) Species and grids - where are declared classes of agents and grid topologies (discrete spatial variables); (iv) Experiments - special agents that carry out the execution of the model, being two types: gui and batch. A species is defined as a set of attributes plus a set of actions and behaviours. Behaviours include: reflex - a sequence of statements that can be executed at each time step; init - a special form of reflex that is evaluated only once when the agent is created; task - a reflex with a weight associated that determines its execution priority in the scheduler; and state - determines if the agent should enter/leave a particular state at each time step. Beyond four primitive data types (bool, float, int and string), GAML supports several advanced features found in general purpose programming languages: loops, iterators and data structures such as lists, maps or matrices. Of the DSL reviewed here GAML is possibly the most versatile, with a wider range of application, due to an extensive number of features and constructs. Eventually, it may come to build a relevant user community like NetLogo did. Nevertheless, as with SELES, GAML still mimics in various ways early third generation languages (such as COBOL) with strict environments for specific code sections. Mastering a language of this depth is naturally a lengthy process, presenting a relevant challenge for less experienced users. GAMA is built on Eclipse, runs on Java and is released under an open source license.

A graphical DSL not conceived for spatial simulation, but worth of mention, is the Agent Modelling Language (AML) [42]. It was developed for social dynamics and is reliant on the Model Driven Architecture (MDA) infrastructure, extending a wide range of different UML meta-classes. Its concepts are organised hierarchically, through several levels of generalisation. At the top is the concept of semi-entity, an abstract element that can be of two types: behavioured or socialised; the former represents elements that can act on their environment, the later specifies elements that can form societies and participate in social relationships. The concrete building blocks of AML are entities, that can be of three types: (i) agents - capable of interactions, observations and autonomous behaviour; (ii) resources - physical or informational entities whose availability is constrained; and (ii) environments - logical or physical surroundings that determine under which conditions entities can exist and function. Three other main concepts model social dynamics: (i) structures - to identify societies and roles; (ii) behaviour - constructs for communication, observation, reaction and services; and (iii) attitudes - to describe individual agent drivers: needs, intentions, goals, beliefs. There is much more to AML, constructs to specify mental agent aspects and even concepts to describe model deployment and execution. No interpreter or code generation infrastructure has ever been developed for AML and no applications could be found in the literature. It is possible that such a detailed language presents too much of a challenge for a full implementation. On the other hand, at the time AML was published, MDA tools where few and less mature than today. AML presents itself as a resource with great potential that is yet to be fulfilled.

Existing DSLs for Spatial Simulation can ease model development and reduce the build-up time in prototyping, but do not fully avoid the need of programming skills. As with general purpose programming languages, the user has to understand the meaning of keywords and how to compose a coherent set of instructions or declarations into a specific model. Some of these DSLs were clearly developed for educational purposes, more as prototyping than analysis tools. Lack of GIS interoperability is an issue to some of them, as so platform or operating system dependency. Apart from AML, these previous DSLs focus on providing a refined concrete syntax but still framed in older programming paradigms emanating from declarative of functional languages.

A wider review of spatial simulation tools can be found in de Sousa and Silva [7].

4 Proposed approach

The vision of this proposal is to provide GIS analysts means of prototyping spatial simulation models with graphical diagrams, that can be parametrised and tuned to the specific application domain. These graphical models are then feed to a code generation facility to produce a ready-to-run simulation based on one of the popular Program-level tools for basic validation. From there analysts can tune the model at the conceptual level using graphical constructs in an iterative process. In this fashion GIS analysts focus their work on modelling itself, abstract of concerns specific to programming, data input or platform dependencies.

Modelling plays an indispensable role in classical engineering disciplines, allowing engineers to study large and complex systems from a higher level of abstraction [1]. In the field of software engineering, modelling is yet to be widely adopted [4], although in many cases end users are requiring systems with a degree of complexity that goes well beyond the abilities of traditional software development tools [14]. Moreover, the integration with parallel disciplines, such as systems engineering, software engineering, control engineering, business process engineering, etc, can be greatly simplified with proper modelling tools [16].

Model-Driven Development (MDD) is a generic designation for several tools and methodologies used to thoroughly include modelling in software engineering [1, 40]. The successful application of MDD requires a fundamental shift in the way software engineers use models, evolving from ad hoc complementary documents to the main focus of their work, thus relegating coding to the background. This is achie-ved through model-to-model and model-to-code transformations, and in some cases by direct model execution. With MDD source code becomes a sub-product of the development process, where the focus is on what the system must do, instead of how it does it [35].

The motivation behind MDD in the software development field is the gain of productivity and quality it can yield through automatic code generation. But further advantages have been identified that justify its application to other domains. In first place the increase in understandability, especially since MDD mostly relies on graphical constructs, more expressive by nature, but also for dispensing the text parsing needed to comprehend source code [35]. Secondly, it promotes fast prototyping, by allowing model execution from a high level of abstraction, before much effort or resources have to be spent on development. This allows early model validation and later on, during the model refinement process, also to identify unintended or undesired model changes [26, 36]. MDD further makes possible the creation of user-definable mappings, the capturing of domain specific concepts at an ontological (or meta-model) level, producing a lexicon of model constructs totally independent of particular code languages or specific software platforms [1, 14]. Finally, it is important to note that a successful MDD application also brings forward an increase in interoperability, by offloading such technical concerns on the code generation infrastructure, that can be adapted to match particular environments or platforms [1, 14].

DSL3S is an application of the MDD philosophy to the specific field of Spatial Simulation, as an alternative way to address the problems identified in Section 2. By raising the level of abstraction at which development takes place, this approach can facilitate the communication between programmers and analysts and other stakeholders lacking programming knowledge [26]. It can also allow prototyping by non-programmers. By detaching model development from specific technologies, it can improve interoperability with geo-spatial data, generating the appropriate code as needed. Lastly, it can lay the foundations for a standard language in the field, as successful efforts in parallel fields have proved, like SysMLFootnote 4 or ModelicaML.Footnote 5

This work employs the Model-Driven Architecture (MDA)Footnote 6 methodology, the concrete MDD approach specified by the Object Management Group (OMG). The UML 2.0 modelling language allows the extension of its core primitives (graphical elements, links, etc) through specialisation for different application domains [29]. This is achieved with the definition of a UML Profile, a collection of stereotypes, properties and constraints. Stereotypes are specialisations of existing UML model elements, defining new elements representing narrower abstractions. A semantically related set of stereotypes, specified by properties and restrictions, can thus be used to customise UML into a new specialised language dedicated to a certain domain.

DSL3S takes spatial simulation as a branch of the wider Spatial Analysis GIS field, where model inputs primarily originate from a GIS and whose outputs also have geo-referenced relevance. At this time the language does not contemplate agents with the internal cognitive capacities that Franklin and Grasser [15] classify as adaptive agents, nor are any explicit concepts of society, or societal interaction considered. All agents are assumed to exist in the space of simulation, thus forcefully being spatial elements. The language does not employ a distinction between Agent-based models and Cellular Automata, aiming at a single approach to both schools of Spatial Simulation, hiding such implementation details from the user.

5 The DSL3S language

DSL3S is defined as a UML profile that includes a set of stereotypes enclosing abstractions underpinning Spatial Simulation. These stereotypes can be seen as the conceptual terms used when explaining a simulation with the terminology of this application domain, e.g., describing “fire” as an agent (because it is mobile and transforms the landscape) or “height” as a spatial variable (that has no innate activity but may influence the actions of certain agents). The DSL3S UML profile allows the development of simulation models by applying these stereotypes and creating the correct relations between them.

This section details the DSL3S language; Section 5.1 presents its Abstract Syntax, Section 5.2 its Concrete Syntax, Section 5.3 lays out the structural semantics and Section 5.4 introduces some guidelines related to model organisation.

5.1 Abstract syntax

Three main constructs can be identified underpinning a spatial simulation: Spatial variables, Glocal variables and Animats. Spatial variables are spatial information layers that have some sort of impact on the dynamics of a simulation, e.g. slope that deters urban sprawl or biomass that feeds a wildfire. Animat is a term coined by Wilson [44] signifying artificial animal; in this context it is used more widely, representing all spatial elements that change or induce change in their surroundings; examples are: fire (in a wildfire model), urban areas (in a sprawling model) or predators (in a population dynamics model). Global variables provide information that is constant across the space of simulation, such as wind direction in a wildfire model or economic trends in an urban development model. Another important sort of context variables are those that support Animat internal state. An Animat is composed by a set of Attributes that describe each instance at a certain moment in time.

The elements considered so far focus on the information needed to run a spatial simulation, but more is required to capture spatial dynamics, the way animats act and react to the environment has to be made explicit. This character of simulation is termed Operation. DSL3S proposes a set of just six predefined animat operations, intending to match the essential properties of an agent, as outlined by Franklin and Grasser [15] (autonomous, continuous, reactive, proactive and mobile) with the core concepts found in Cellular Automata (state, neighbourhood, transition rules and time). In their seminal book, Epstein and Axtell [10] conceive a considerably larger set of operations, including elaborate processes such as trade and cultural exchange. The option for a strict set of operations rests on three reasons: (i) to keep the language compact and easy to learn; (ii) more refined operations are less common in spatial simulation applications and can eventually be composed with these simpler primitives; and (iii) to insulate the user from technical implementation details in the choice between Cellular Automata and Agent-based models. These animat operations are:

  • Emerge: sets the conditions under which a new instance of an animat can appear in the simulation, i.e., the act of ”birth”; an example may be an urban development simulation where the emergence of new urban spots is possible in an area that meets a certain set of criteria, like distance to transport infrastructure or topography.

  • Move: relates an animat with one or more spatial variables or with other animats determining the locations that are more or less favourable to be in.

  • Replicate: captures operations where an animat replicates itself, such as an organism in a biological simulation reproducing a sibling.

  • Supply: provides access to animat internal attributes, thus making resources or information available to other animats. It is the supply side of an interaction between animats.

  • Harvest: an operation that allows an animat to collect resources or information from other elements in its neighbourhood; it may concern other animats, targeting attributes, or spatial variables. Between animats it is the demand side of an interaction, the counterpart of Supply. Examples may be wildfire consuming biomass or the seizure of resources from another animat as with a predator-prey simulation.

  • Perish: defines the circumstances under which an animat may cease to exist during simulation; examples can be a biological animal starving or a fire extinguishing.

Figure 1 presents these key constructs in a conceptual model. A Simulation is composed by a set of Spatial and Global variables plus a set of Animats; the latter are composed by a set of Attributes and Operations, that determine how their internal state evolves. An animat acts through different types of Operations, that can induce changes on global and spatial variables, or be employed to interact with other animats. Different Animat configurations can be assigned to a Simulation, thus creating a different simulation scenario.

Fig. 1
figure 1

The DSL3S meta-model

5.2 Concrete syntax

The DSL3S UML profile gives body to the abstract syntax outlined above, with constructs defined as UML stereotypes. The properties of each stereotype are detailed in Appendix A. The stereotype Simulation is used to host definitions such as the spatial extent of simulation. It bonds together all the other elements, as an entry point to the simulation.

The stereotype Global is intended to be a scalar value that can vary with time. It can, for instance, be set randomly at simulation start and/or made to evolve randomly each time step. It can also be feed into the model as a predefined time-series, that may be an input from a text file.

The Spatial stereotype is essentially a stub for the input of geo-referenced data. Each instance corresponds to a spatial layer (either in raster or vector format) with the characteristic of having an unequivocal value for each location in space. This stereotype also provides means for the fully random generation of spatial variables, that may be useful for prototyping with synthetic scenarios.

The stereotype Animat is an aggregation of attributes existing at a identifiable location in space. The Attribute stereotype is a single characteristic of the Animat, representable by an primitive data type, such as an integer or a boolean (e.g. population in an urban development simulation). The initial number of animat instances of each type, and their spatial positioning, can be provided by a specific geo-referenced data set, such as a raster map. The values of Attribute elements can also be initialised with the same spatial data set, through its attribute table. These initial Animat and Attribute settings can also be randomly generated for simulations where it may apply.

The animat operations identified previously are also stereotypes in DSL3S; in detail:

  • Emerge: this stereotype defines neighbourhood thresholds relative to spatial variables, or relative to other animat attributes, above which the emergence of a new animat becomes possible. When a new animat is created, its initial state is set according to the parameters set in the Animat class itself.

  • Move: this stereotype provides properties to weight the relevance of each related class influencing the movement of an animat. For instance, in a predator-prey simulation the movement of a ”sheep” animat may be positively weighted in relation to a ”grass” Spatial layer and negatively weighted in relation to a ”wolf” animat.

  • Replicate: this stereotype provides properties to set replication thresholds against the animat internal state. Impact on the reproducing animat and inheritance of attribute values to the new animat can also be modelled with specific properties. As with the Emerge operation, the initial state of a new animat resulting from a replication is set according to the properties set in the Animat class itself.

  • Supply: together with Harvest this stereotype provides ways for animats to exchange assets. It makes the information or resource held in a particular Attribute available to other animat. A limit may be set on the amount of this asset that another animat may acquire in each interaction.

  • Harvest: this stereotype provides properties to parametrise how an interaction impacts the harvesting Animat. This is modelled with an harvest rate or harvest amount. A consumption rate of 100 % may be used to model preying relationships, whereas 0 % can be used to simply collect information on neighbouring animats and variables.

  • Perish: This stereotype defines an interval of values relative to an Attribute element, determining the conditions for the existence of the Animat itself.

Other operation stereotypes can be added in the future if necessary; DSL3S is conceived to remain a language open to further extension.

5.3 Structural semantics

To properly define a DSL3S simulation a set of rules must be followed regarding the valid associations between the different language constructs. Table 1 synthesises these rules, indicating which relations are valid and their respective cardinalities. A more thorough description of these rules follows.

Table 1 Valid relationships in DSL3S with respective cardinalities

Each DSL3S model must contain exactly one Simulation construct. To it each Animat, Global or Spatial elements composing the model must be associated.

Spatial and Global variables represent passive constructs, but may appear associated with operation constructs, in such cases becoming sources of information and resources to Animat elements. As for Attribute constructs, they must always be associated to exactly one Animat (the owner).

An Animat aggregates Attribute elements, defining its internal composition. Animats do not link directly to any of the information constructs, Spatial or Global, neither to other Animats. All associations of an Animat with other elements of a simulation are made through its operation constructs.

A Move construct associates an Animat with other spatial objects. It can create a link to an Attribute or to a Spatial variable, quantifying propensity for movement. Beyond the link to the owner Animat, each Move construct must also link to exactly one other construct.

Emerge constructs are subject to rules similar to those applying to the Move operation, they must always link one Animat (its owner) with another construct in the model. Beyond Attribute and Spatial constructs, Emerge can also associate an Animat to a Global construct.

The Supply construct must always be associated to an Attribute, to which it provides access. It can then be associated to multiple Harvest constructs that access the resource or information supplied.

Harvest must also be always associated to an Attribute that stores the collected resource or information. On the other end it may associate to a single other construct: Supply (in case the harvested target is an Animat), Global or Spatial.

Replicate and Perish construct are simpler, since each must be linked to a sole Attribute construct, creating the boundary conditions for the respective operation. They can not be associated with any other construct, and thus each can only take part in a single association in the model.

5.4 Model organisation - views

Models built with DSL3S can become visually complex if a single diagram is used to represent all classes, properties and associations. To avoid such difficulties and provide a thorough structure for the development and presentation of models with the language, a multi-view approach is proposed. These views intend to display the model in such a way that each aspect of a simulation can be better presented in a specific diagram, namely the following: Simulation, Animat, Animat Interactions and Scenario views (see Fig. 2).

Fig. 2
figure 2

DSL3S model views

The Simulation View contains the model settings and the participating variables. It includes the single Simulation construct plus the necessary Global and Spatial elements.

The Animat View provides a container where to define the structure of an animat. In this view an Animat and its belonging Attribute constructs should be present, plus any associations to Spatial or Global elements. This includes any operations linked to these elements: Emerge, Move, Replicate or Perish. A view of this kind per animat is recommended, thus visually encapsulating its configuration.

The Animat Interaction View is used to to describe operations between animats. It should contain all the Supply and Harvest constructs relating two (or more) animats, plus associated Attribute elements. Move operations relative to other animats may also be set in this view.

Lastly the Scenario View, is used to assign animats to a simulation. In this way, the designer may explore different animat configurations that can be used in different runs of a same simulation.

This multi-view structure is recommended, but does not have to be necessarily followed to develop a model with DSL3S. The user is free to use alternative organisations that may be considered more appropriate in specific cases.

6 MDD3S - prototype implementation

Model Driven Development for Spatial Simulation Scenarios” (MDD3S) is the name of the prototype framework that supports the DSL3S language. MDD3S relies solely on open source tools (see Fig. 3):

  • Papyrus - an EclipseFootnote 7 add-on for UML modelling supporting the DSL3S UML profile;

  • Acceleo - another Eclipse add-on supporting the model-to-code transformation templates;

  • MASON - a Program-level spatial simulation framework used as a library by the code generated.

Fig. 3
figure 3

The technologies used to implement MDD3S

This section reviews some relevant aspects of these technologies in the scope of the MDD3S framework.

6.1 Papyrus

PapyrusFootnote 8 is an open source project started by the Commissariat á l’Énergie Atomique in France, with the aim of producing an advanced graphical editor for the UML language. It is based on the Eclipse Modelling FrameworkFootnote 9 (EMF), allowing the edition and visualisation of structured models defined with the XMI standard. It also provides a set of Java classes to facilitate model manipulation. Presently Papyrus is close to fully support version 2 of UML, bearing the development of ad hoc DSLs through the definition of UML profiles.

6.2 Acceleo

AcceleoFootnote 10 is an open source code generator created by the French company Obeo. It is also built on EMF, facilitating interoperability with several other EMF based modelling tools. Acceleo interprets the MOF Model to Text Transformation languageFootnote 11 (MOFM2T), also an OMG standard. Though not yet fully implementing MOFM2T, the model-to-code transformations produced with Acceleo are today possibly the closest to the scheme proposed by the OMG.

The model-to-code transformation mechanism is based on special files called templates, which define the text output to produce from a graphical model. They are composed by regular text plus a series of annotations that are substituted by values and names of model elements at transformation time. Traditional computational operations such as branches or loops are also possible to include with specific annotations, producing more complex transformations. Templates can be articulated through an inclusion mechanism, whereby a master template can make use of several other templates, creating a transformation chain. When fully developed, a transformation chain can be transformed into an independent plug-in for Eclipse, facilitating its portability and application.

Acceleo 3 fully supports transformations from models using UML profiles, identifying stereotypes applied on classes and providing access to its properties. The later is not based on MOFM2T, but provided by a service, essentially a Java method that browses through the UML object model associated with each class (example in Table 2).

Table 2 The Java service hasLinkedStereotype used in MDD3S to determine if a model element is linked to elements of specific type

When a transformation chain is applied on a model all its elements are run through the several templates declared in the master. Typically, the template file filters each element, generating code only for those with a specific stereotype applied on. Such is the case with MDD3S, a template named Simulation, for example, generates the code for elements with the homonym stereotype applied. In most cases a template produces a text file, in MDD3S these are the Java classes that compose the end model. Alternatively, a template may simply generate a segment of code to be included in another file; in MDD3S the Perish template is an example, producing a method to be included in Java class generated for Animat type elements (Table 3). The Java assets generated for each stereotype are detailed in Appendix A.

Table 3 The MDD3S template for the Perish stereotype; it parses an Animat element and successively iterates through each of the associated Attribute elements and to the Perish elements associated to these. hasStereotype(), isNotNull() and getTaggedValue() are external services

6.3 MASON

MASON (acronym for Multi-Agent Simulator Of Neighbourhoods) is a light-weight, highly-portable, multi-purpose agent-based modelling package [23]. MASON is a tool that in some aspects contrasts with earlier simulation packages like Swarm or RePAST that date back to the 1990s, following a strict object oriented philosophy from its very beginning. Its objects are architected in such a way that simulation models are completely isolated from visualisation and input/output mechanisms. MASON is fully written in Java and open source, producing programs that are highly portable, not only running alike, but also presenting identical results across different platforms. Comparative results have shown that MASON is likely the fastest of the main Program-level tools for spatial simulation [31]. Supported by extensive documentation and a relevant community,Footnote 12 MASON has slowly expanded its adoption.

GeoMasonFootnote 13 is an extension that provides objects to deal specifically with geo-referenced data. Input and output functionality is available for both raster and vector datasets, relying on third party packages: the Java Topology SuiteFootnote 14 for geometry manipulation, GeoToolsFootnote 15 for vector formats input/output and GDALFootnote 16 for raster formats.

Its light-weight infrastructure, extensive documentation, and ease of integration through Eclipse made MASON an obvious choice to support MDD3S.

7 Validation

The DSL3S UML profile and its accompanying MDD3S framework are publicly available at the code sharing platform GitHub.Footnote 17 Some examples are also available that showcase the usage of the language. In this section three of these illustrative simulations are discussed.

7.1 Simulation model A—predator-prey

Predator-Prey simulations are one of the oldest applications of spatial simulation techniques [9], used to study population dynamics in the field of Biology. It usually features two animal species, where one feeds of the other; energy flows through the food chain in waves, whose period and amplitude are function of the growth rates of the several species.

7.1.1 The DSL3S model

This example takes place in a synthetic plane of 100 by 100 abstract space units. There are three main elements to this simulation: a Spatial variable named Pasture and two animats: Predator and Prey (Fig. 4). Pasture covers the whole simulation space and is initiated from a sample raster file that represents energy available at each space unit. This energy at each location increases at each time step, at a fixed rate, up to a defined limit.

Fig. 4
figure 4

Predator-prey model in DSL3S; simulation, scenario and prey views

Prey is an herbivore animat composed by a single Attribute: PreyEnergy. At simulation start a number of these animats are cast randomly across the simulation space, with its Energy attribute also randomly initialised. PreyEnergy declines steadily at each time step by a defined amount. A Perish operation attached to Energy sets a lower threshold below which the animat is discarded from the simulation. An Harvest operation parametrises the feeding act of Prey over Pasture; at each time step the animat can take all the Pasture energy available at the location it occupies into its own PreyEnergy attribute. Two Move operations relate Prey with both Pasture and PredEnergy, making it prefer locations with high Pasture energy and free of Predator instances. Finally, a Replicate operation sets a threshold above which the Prey can reproduce itself, as so the amount of energy passed on to the offspring in the process (the DSL3S view for Prey is shown in Fig. 5).

Fig. 5
figure 5

Predator-prey model in DSL3S; predator and interaction views

Predator is a carnivore animat that shares many similarities with Prey. It also possesses a single attribute (PredEnergy) and its instances are created from an input vector layer, using the layer attribute table to initialise Predator attributes. Its energy also declines with time and a Perish operation determines when it ceases to exist. Predator feeds on Prey, according to an Harvest operation linking to a Supply operation associated with PreyEnergy. When a Predator feeds of a Prey it takes up all of its energy, triggering the later’s Perish operation. A single Move operation links Predator again to the Prey energy attribute, this way compelling it to move towards locations where well nurtured Prey instances exist. A Replicate operation sets similar reproduction conditions to those for Prey (Fig. 5 presents the Predator view).

7.1.2 The resulting application

The simulation generated from this DSL3S model produces the typical population cycles seen in this type of models, as in the case of the historical WATOR model [9]. Figure 6 shows the simulation space during a sample run at time-steps 0, 30, and 90. Prey animats reproduce faster and thus dominate the space during the first time steps, producing an initial growth wave, reaping the fertile feedstock. In time, Predator animats feed of the excessive amount of Prey animats creating a new wave; this Predator wave clears some areas, fostering growth of the Pasture space variable in certain patches.

Fig. 6
figure 6

A sample run of the Predator Prey DSL3S simulation; Pasture is portrayed with a yellow to green choropleth, Prey is portrayed in blue and Predator in red

7.2 Simulation model B—forest fire

Forest fire has been a classical application field for spatial simulation [22], whereby the direction and intensity of fire at hypothetical locations is explored. The example here presented is rather simple, intended to illustrate other sorts of spatial dynamics possible to model with DSL3S.

7.2.1 The DSL3S model

There are only two elements to this simulation, a Spatial variable for Forest and an Animat for Fire. Forest occupies the entire simulation space (100 by 100 units) and is initialised from a sample raster layer; it is not set to evolve with time.

Fire is composed of a single Attribute, registering the Intensity of each instance. At simulation start a pre-defined number of Fire instances is randomly cast in space with Intensity at its minimum value. An Harvest operation associates Intensity with Forest, defining the burning process, a constant depletion rate of the biomass existing at the location.

The amount of biomass burnt is transferred to the Intensity attribute, but at the beginning of each time step this variable is brought back again to its minimum; a Perish operation attached to Intensity guarantees that the animat is discarded if no biomass is left at the location (Intensity remains at the minimum). Fire is an animat that does not move, but it can spread to adjacent locations. This is modelled with Emerge operations, that link Fire to Forest and Intensity. The larger the amount of biomass in a location, and the more intense the fires burning in its neighbourhood, the higher the probability of a new Fire to emerge; the presence of a close by Fire animat is indispensable.

If the probabilities set for these Emerge operations are high enough, eventually most of the biomass burns down in this simulation. Still, it can be used to observe uneven fire spread patterns, following spatial patches with denser biomass. Figure 7 presents the diagrams defining the Fire simulation.

Fig. 7
figure 7

Forest fire model in DSL3S

7.2.2 The resulting application

Figure 8 portraits a run of the simulation generated from the Fire model described above. At time-step 0 three fire spots are randomly cast in the simulation space. With a relatively high probability of sprawling to adjacent cells, it slowly consumes the vegetation in every direction, especially towards locations with higher biomass density. The randomness of the emergence routine is apparent in the assorted locations left untouched after 100 time-steps.

Fig. 8
figure 8

A sample run of the Fire DSL3S simulation. Fire is represented with a choropleth from yellow (low intensity) to red (high intensity)

7.3 Simulation model C—urban sprawl

Urban dynamics was another of the early application fields adopting spatial simulation techniques. The growth of cities is taken generally as an emergent process, bounded by spatial restrictions and enablers, by which the urban fabric sprawls. SLEUTH [5], a Model-level tool dating back to the 1990s, proved particularly successful in this domain and has been applied to varied geographic contexts.

7.3.1 The DSL3S model

In this example space is vacant at simulation start and is progressively occupied by urban elements. The simulation is built around four elements: (i) an Animat named Urbe that possesses a single Attribute, storing its Age; (ii) a Global variable termed Speed that declines with time; (iii) a Spatial variable to input a vector layer with Protected areas, where urban growth is not possible; (iv) another Spatial variable that inputs a Roads layer, an enabler of urban sprawl. Figure 9 presents the full model in three views.

Fig. 9
figure 9

Urban sprawl model in DSL3S

At simulation start a single Urbe animat is cast at random in the simulation space, it is not mobile and all the dynamics operates through related Emerge operations. In first place there is the SprawlAge operation that sets the probability of a new Urbe animat emerging nearby an existing Urbe; this operation is linked to the Age attribute, rendering emergence less probable near older urban areas. Also to constraint growth with time (mimicking diminishing capital investment) is the SprawlSpeed operation, linking to the Speed variable; as its internal value declines with time, it slows downs growth. SprawlRoads relates Urbe with Roads, increasing the probability of emergence around spatial features of this layer. In similar fashion, SprawlProtect sets the probability of emergence to zero (with a large negative weight) over spatial features in the Protected areas layer. The Simulation space is set to a grid of 100 by 100 cells, this way determining the space between adjacent emerging urban elements.

7.3.2 The resulting application

Figure 10 shows the simulation space resulting from this model at time steps 0, 100 and 400. In the very beginning there is a single urban element, presented in red; in dark blue are shown Protected areas, while yellow lines portrait Roads. Development is swift in the beginning, with new urban elements emerging along road features; as they age, the colour of Urbe elements slowly fades to a light cyan. With time, sprawl slows down and a larger number of steps is required for changes in the urban fabric to become apparent. With the areas surrounding Roads taken, sprawl then turns inwards, but avoiding Protected polygons.

Fig. 10
figure 10

A sample run of the Urban Sprawl DSL3S simulation; Urbe is portrayed with a red to cyan ramp, Roads are portrayed in yellow and Protected areas in dark blue

8 Conclusion and future work

The application of spatial simulation techniques to the GIS domain remains today locked in the choice between versatile tools, and the option for ease of use with pre-built models. The former require advanced programming skills, while the later impose relevant compromises of transparency and scope. A need exists for a higher level of development abstraction, that can also improve the documentation, readability and communication of models.

Several DSLs, and respective tools, such as NetLogo, Ocelet or GAML, were previously tried in this field, invariably producing declarative or functional languages, in some cases lacking a formalisation of their abstract syntax. Table 4 succinctly compares a selection of these with DSL3S. In essence, these efforts relying on textual languages end in the pitfalls identified by Selic [36] regarding fourth generation languages: they struggle to hike the level of abstraction at which model development takes place. They also impose compromises with platform dependence and in some cases with weak spatial data support and interoperability.

Table 4 Comparison of several simulation DSLs with DSL3S

This work proposes a different approach to this subject, applying the MDD philosophy. The end result is the DSL3S UML profile, that forms a graphical language, and its companion MDD3S framework, that involves a modelling and a model-to-code transformation infrastructure. These assets permit to translate a graphical and platform independent model produced with DSL3S into a coded simulation supported by a Program-level tool. In other fields this approach has proved capable of inducing faster development, reduce coding errors and improve model readability [4, 26, 30].

The application of DSL3S, and MDD tools in general, requires a certain degree of familiarity with graphical semantics (e.g. boxes and links) and modelling tools that may not be straightforward to all users. In spite of broad usage in computer science, languages such as UML do not yet feature in curricula of other technical disciplines. For this reason, DSL3S was conceived as a rather compact language, defining only eleven constructs, five structural and six operational. A structure of views is also proposed, that helps organising models developed with the language. This contrasts with AML, for instance, where the focus was mainly on defining a deeply detailed language, missing any modelling and model-to-code transformation infrastructure.

The relative simplicity of DSL3S can be restrictive to some extent in more demanding and complex scenarios. Nevertheless, functionality provided by the MDD3S framework can already transform abstract models into executable code, allowing at least for fast simulation prototyping. This article shows how DSL3S language elements can be combined to produce diverse simulations on different fields of application. To further ease the usage of DSL3S, a user manual has been created on-lineFootnote 18 that is being expanded with examples and best practices. A series of tutorial videos will also be included.

Current development of the MDD3S framework relies on MASON, a modern Java library for spatial simulation. This option also guarantees interoperability with geographic data, namely through the GeoMASON extension. This framework is being developed on the Eclipse IDE, using the MDD ad-ons Papyrus (for UML modelling) and Acceleo (for code generation). The code generated with MDD3S is relatively extensive vis à vis the expected outcome from ad hoc development with a Program-level tool. At this stage, MDD3S prizes simplicity and understandability over performance, a character of its purpose as a demonstrator prototype. If performance ever becomes a requirement model-to-code transformation templates can be optimised in that sense. Going further, transformations may even be developed to target a programming language closer to machine code such as C. Having a single abstract model producing different implementations relying on different code libraries is another distinctive advantage of this MDD approach.

In the near future, DSL3S will be further assessed through its application with real world scenarios. This iterative research process will allow to understand how far it can go in its current form and if extensions are necessary.