Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The Agent-Based modelling (ABM) paradigm is developing into a powerful tool in many disciplines as seen in Crooks and Heppenstall (2012), Johansson and Kretz (2012) and Harland and Heppenstall (2012), but also in a other disciplines such as archaeology (Axtell et al. 2002), economics (Tesfatsion and Judd 2006), health (Epstein 2009), geography (Batty 2005) and computational social science more generally (see Cioffi-Revilla 2010 for a discussion). Such models allow researchers to explore how through the interaction of many individuals more emergent phenomena arise. Moreover, it allows for practitioners to build models of complex social phenomenon by simulating the interactions of the many actors in such systems. Thus gaining insights that will lead to greater understanding and, in some cases, better management of the behaviour of complex social systems. The intention of this chapter is to the outline how one can develop geospatial agent-based models (i.e. that model spatially explicit geographic phenomena – where the nature of the features and movement that is represented varies over the Earth’s surface). Essentially, geospatial models depend on the location of the features or phenomena being modelled, such that if one or more of those locations change, the results of the model change (Wegener 2000). Geographical Information Systems (GIS) are a particularly useful medium for representing model input and output of a geospatial nature. However, GIS are not well suited to dynamic modelling (Goodchild 2005; Maguire 2005) as will be discussed in Sect. 12.2. Consequently, Sect. 12.2.2 explores the opportunity of linking (through coupling or integration/embedding) a GIS with a simulation/modelling system purposely built for the task at hand (Sect. 12.3), and therefore better suited to supporting the requirements of ABM.

2 Modelling Within GIS: Current Capabilities

It can be difficult to comprehend how GIS technology, built essentially for handling maps and “map-related ideas”, can be adapted to the needs of dynamic simulation modelling; especially when it is not even perceived as an optimal platform for modelling (Goodchild 2005). Particular criticisms of GIS with respect to modelling is their ability to handle time (Langran 1992; Peuquet 2005– see Sect. 12.2.1), the representation of continuous variation (Longley et al. 2005), and most have only rudimentary modelling capabilities (Maguire 2005). Nevertheless, there are several good reasons to justify why the use, or linkage of GIS with simulation/modelling systems (see Sect. 12.2.2), is an effective means of modelling when spatial and temporal analysis is necessary.

Current commercial and public domain GIS software systems all contain numerous tools for acquiring, pre-processing, and transforming data. Their use in modelling includes data management, format conversion, projection change, re-sampling, raster-vector conversion, etc. GIS also include excellent tools for visualisation/mapping, rendering, querying, and analysing model results, as well as assessing the accuracies and uncertainties associated with inputs and outputs.

Typically, all of the capabilities described above are accessible via end-user graphical and command line interfaces. However, these capabilities have recently become accessible through application programming interfaces (APIs), via software libraries. The exposure of APIs was a significant recent improvement in terms of GIS and spatial modelling, as external programmers now have access to the underlying software components upon which GIS software vendors base their end-user versions of systems. This is perhaps the most pertinent enhancement, as many of the techniques used in GIS analysis are potentially far more robust if they can be linked with an extensive toolkit of methods for simulation; an issue which is addressed at greater length later in Sect. 12.2.2. GIS vendors have invited this situation as it allows GIS to be extended and customised for use in new application areas, thus expanding the market potential of their systems.

Alternatively, a model can be expressed as a sequence of GIS commands executed by a script (Maguire 2005). Recently in GIS there has been a move to use industry-standard low-level programming languages (e.g. Java, C++, and Visual Basic), and scripting languages (e.g. Python, VBScript, and Jscript), rather than proprietary, home grown scripting languages (e.g. ESRI’s Arc Macro Language, AML, or Avenue). Interoperability standards such as the Microsoft.Net framework facilitate this process by allowing compliant packages to be called from the same script.

In addition to scripts, graphical flowcharts can be used to express sequences of operations that define a model. Longley et al. (2005) note that one of the first graphic platforms for conceptualising and implementing spatial models was probably the ERDAS IMAGINE software, which allows the user to build complex modelling sequences from primitive operations. ESRI is another GIS vendor that provides an environment that allows models to be authored and executed in a graphical environment: ModelBuilder within ArcGIS 9.x, which superseded Spatial Modeller within ArcView 3.

In principle, graphic-model building can be used for dynamic modelling via an iterative process, where the output of one time step becomes the input for the next. However, this method posses two dilemmas: (1) the GIS will not have been designed for an iterative process, requiring the user to re-enter the data at the beginning of each time step, and; (2) the time required to run a model could be considerable. The former of these problems can be overcome with scripting languages (e.g. Python in ArcGIS); both can potentially be overcome by integrating the GIS with a simulation/modelling system better equipped for the task at hand. Before exploring the possibilities of linking GIS and simulation/modelling systems (Sect. 12.2.2), the following section of this chapter evaluates the capability of GIS to handle space-time information, which computer simulations generate in volume, and has always been a limitation.

2.1 Representing Time and Change Within GIS

The subject of time within GIS has received a considerable amount of attention. Heywood et al. (2006) comments that ideally, GIS would be able to represent temporal change using methods that explicitly represent spatial change, as well as different states through time. Furthermore, methods allowing direct manipulation and comparison of simulated or observational data in a temporal and spatial dimensions should be catered for. In reality, two main challenges for the integration of time within GIS exist: (1) continuous data over a period of time are rarely available for an entity or system of interests; (2) data models and structures able to record, store, and visualise information about an object in different temporal states are still in their infancy (Heywood et al. 2006). In the context of this chapter, the former challenge is less of a constraint since an agent-based computer simulation is capable of generating an abundance of data over a continuous period of time, while much progress has been made on the later issue. The following discussion outlines issues related to the representation of time and change, as well as approaches for incorporating space-time information within GIS.

The basic objective of any temporal database is to record change over time, where change can be thought of as an event or collection of events. An event might be a change in state of one or more locations, entities, or both. Changes that might affect an event can be distinguished in terms of their temporal pattern; Peuquet (2005) has suggested four types: (1) continuous – events occurring throughout some period of time; (2) majorative – events occurring most of the time; (3) sporadic – events occurring some of the time, and; (4) unique – events that only occur once. The distribution of events within these temporal patterns can also be very complex (e.g. chaotic, cyclic, or steady state), complicated further as change, to some extent, is always occurring at various rates as well (e.g. from sudden to gradual). Hence, duration and frequency are important descriptive characteristics within this taxonomy of temporal patterns.

There are three approaches for capturing space-time information within a GIS: (1) location-based; (2) time-based, and; (3) entity-based. The only method of viewing a data model within existing GIS, as a space-time representation, is as a temporal series of spatially-registered ‘snapshots’ (Peuquet 2005). Invariably this approach employs a raster data model, although vector has also been used, with only a single information type stored (e.g. elevation, density, precipitation, etc.) for each cell at any one point in time. Information for the entire layer is stored for each time step, regardless of whether change has occurred since the previous step. There are several criticisms of this approach. Firstly, the data volume increases enormously, because redundant data is stored in consecutive snapshots. The state of a spatial entity can only be retrieved by querying cells of adjacent snapshots, because information is stored implicitly between each time step. Finally, the exact point when change has occurred cannot be determined. Langran (1992) has proposed a modification of this approach. The temporal-raster (or grid) approach allows multiple values to be stored for each pixel. A new value, and the time at which change occurred for each pixel is stored, which can result in a variable number of records for each cell. Recording the time at which change has occurred allows for values to be sorted by time. The most recent value for each cell can therefore be retrieved, which represents the present state of the system. The obvious advantage to this approach is the reduction of redundant data stored for each cell.

Peuquet and Duan (1995) have proposed a time-based approach to storing space-time information within a GIS, where change is stored as a sequence of events through time. Time is stored in increasing order from an initial point, with the temporal interval correlating to successive events. An event is recorded at the time when the amount of accumulated change is considered significant, or by another domain-specific rule. This type of representation has the advantage of facilitating time-based queries, and the addition of a new event is straight forward as it can simply be added to the end of the timeline. Furthermore, in terms of modelling an important capacity of any model is the ability to represent alternative versions of the same reality. The concept of representing multiple realities over time is called branching. Branching allows various model simulation runs to be compared, or simulation results to be compared to observed data. The time-based approach facilitates the branching of time in order to represent alternative or parallel sequences of events resulting from specific scenarios, because it is strictly an ordinal timeline.

Finally, several entity-based space-time models have been proposed. Conceptually these models extend the topological vector approach (e.g. coverage model); tracking changes in the geometry of entities incrementally through time. The amendment vector model was the first of this type, and extended frameworks have been proposed subsequently. Besides maintaining the integrity of entities and their changing topology, these approaches are able to represent asynchronous changes to entity geometries. However, the space-time topology of these vectors becomes increasingly complex as amendments accumulate through time. In addition, aspatial entity attributes can change over time. To record aspatial changes, a separate relational database is often used. However, if change occurs at a different rate between the spatial and aspatial aspects of an entity, maintaining the identity of individual entities becomes difficult, especially when entities split or merge.

Object-oriented data models have transformed the entity-based storage of space-time information within GIS (Zeiler 1999), and have become mainstream within commercial GIS (e.g. the geodatabase structure with ArcGIS). They have grown increasingly more sophisticated, catering for a powerful modelling environment. The object-oriented data model approach provides a cohesive representation that allows the identity of objects, as well as complex interrelationships to be maintained through time. Specifically, temporal and location behaviour can be assigned as an attribute of features rather than the space itself, which has the distinct advantage of allowing objects to be updated asynchronously. Despite the advantages of the object-oriented data model, Reitsma and Albrecht (2006) observe that, to date, no data model or data structure allows the representation of processes (i.e. recording a process that has changed the state of an object within a model).Footnote 1 Consequently, queries about where a process is occurring at an instant of time cannot be expressed with these current approaches. Notwithstanding, object-oriented data models are the canonical approach to the storage of space-time data generated by agent-based models, and their visualisation within GIS, given their complementarities. Nevertheless, the visualisation of agent-based models within GIS is still limited to a temporal series of snapshots.

2.2 Linkage – Coupling Versus Integration/Embedding

Models implemented as direct extensions of an underlying GIS, through either graphic model-building or scripts, generally make two assumptions: (1) all operations required by the model are available in the GIS (or in another system called by the model); and, (2) the GIS provides sufficient performance to handle the execution of the model (Longley et al. 2005). In reality, a GIS will often fail to provide adequate performance, especially with very large datasets and a large number of iterations, because it has not been designed as a simulation/modelling engine. This one-size-fits-all approach inherent in GIS provides limited applicability, and attention has subsequently been devoted to linking, either through coupling or integration/embedding, GIS with simulation/modelling systems more directly suited to users needs. General classifications have been produced by numerous authors (e.g. Maguire 1995; Bernard and Krüger 2000; Westervelt 2002; Goodchild 2005; Longley et al. 2005; Maguire 2005). Several of their definitions now overlap as technological advance has blurred the boundaries of their classifications, whist some definitions are convoluted because terminology has been used interchangeably or sometimes inappropriately (e.g. coupling, linkage or integration). Nevertheless, categorisation of these techniques is possible, and a brief description of each is developed below, in an attempt to clarify the situation. This is followed by a critique of these different approaches, with a view to identifying an appropriate method for developing geospatial agent-based models.

In situations where GIS and simulation/modelling systems already exist (e.g. as commercial products), or the cost of rebuilding the functionality of one system into another is too great, the systems can be coupled together (Maguire 2005). Coupling can therefore be broadly defined as the linkage of two stand-alone systems by data transfer. Three types of coupling are distinguishable, although these are only a subset of the much larger fields of enterprise application integration (Linthicum 2000) and software interoperability (Sondheim et al. 2005). The attributes of each approach cascaded along the coupling continuum, from loose to tight/close (Table 12.1 summaries the competing objectives of the different coupling approaches; greyed boxes are considered more desirable characteristics – adapted from Westervelt 2002):

Table 12.1 Comparison of coupling approaches (Adapted from Westervelt 2002)
  1. 1.

    Loose Coupling. A loose connection usually involves the asynchronous operation of functions within each system, with data exchanged between systems in the form of files. For example, the GIS might be used to prepare inputs, which are then passed to the simulation/modelling system, where after execution the results of the model are returned to the GIS for display and analysis. This approach requires the GIS and simulation/modelling system to understand the same data format; if no common format is available an additional piece of software will be required to convert formats in both directions. Occasionally, specific new programmes must be developed to perform format modifications;

  2. 2.

    Moderate Coupling. Essentially this category encapsulates techniques between loose and tight/close coupling. For example, Westervelt (2002) advocates remote procedure calls and shared database access links between the GIS and simulation/modelling system, allowing indirect communication between the systems. Inevitably, this reduces the execution speed of the integrated system, and decreases the ability to simultaneously execute components belonging to the different software; and,

  3. 3.

    Tight or Close Coupling. This type of linkage is characterised by the simultaneous operation of systems allowing direct inter-system communication during the programme execution. For example, standards such as Microsoft’s COM and .NET allow a single script to invoke commands from both systems (Ungerer and Goodchild 2002). A variant of this approach allows inter-system communication by different processes that may be run on one of more networked computers (i.e. distributed processing).

Coupling has often been the preferred approach for linking GIS and simulation/modelling systems. However, this has tended to result in very specialised and isolated solutions, which have prevented the standardisation of general and generic linkage. An alternative to coupling is to embed or to integrate the required functionality of either the GIS or simulation/modelling system within the dominant system using its underlying programming language (Maguire 2005). The final system is either referred to as GIS-centric or modelling-centric depending on which system is dominant. In both instances, the GIS tools or modelling capabilities can be executed by calling functions from the dominant system, usually through a graphical user interface (GUI). Compared to coupling, an embedded or integrated system will appear seamless to a user (Maguire 1995). However, in the past integration has been based on existing closed and monolithic GIS and simulation systems, which poses a risk of designing systems that are also closed, monolithic, and therefore costly (Fedra 1996).

Interest in modelling-centric systems has increased considerably over recent years, predominately due to the development of simulation/modelling toolkits with scripting capabilities that do not require advanced computer programming skills (Gilbert and Bankes 2002). Often the simulation/modelling toolkit can access GIS functions, such as data management and visualisation capabilities, from a GIS software library. For example, the RepastJ (see Sect. 12.3.3.3) toolkit exploits functions from GeoTools (a Java GIS software library) for importing and exporting data, Java Topology Suite (JTS) for data manipulation, and OpenMap for visualisation. The toolkit itself maintains the agents and environment (i.e. their attributes), using identity relationships for communication between the different systems. Functions available from GIS software libraries reduce the development time of a model, and are likely to be more efficient because they have been developed over many years with attention to efficiency. Additionally, the use of standard GIS tools for spatial analysis improves functional transparency of a model, as it makes use of well known and understood algorithms. Alternatively, spatial data management and analysis functions can be developed within the modelling toolkit, although this strategy imposes huge costs, in terms of time to programme the model, and time required to frequently update spatial data or use spatial analysis functions within the model.

Conversely, the GIS-centric approach is an attractive alternative; not least because the large user-base of some GIS expands the potential user-base for the final model. Analogous to the modelling-centric approach, GIS-centric integration can be carried out using software libraries of simulation/modelling functions accessed through the GIS interface. There are many examples of simulation/modelling systems integrated within commercial GIS, including: the Consequences Assessment Tool Set (2011, CATS) system, designed for emergency response planning; the Hazard Prediction and Assessment Capability (2004, HPAC) system, for predicting the effect of hazardous material releases into the atmosphere; the NatureServe Vista (2011) system, for land use and conservation planners.

Brown et al. (2005) propose an alternative approach which straddles both the GIS-centric and modelling-centric frameworks. Rather than providing functionality within one system, the middleware-based approach manages connections between systems, allowing a model to make use of the functionality available within the GIS or the simulation/modelling toolkit most appropriate for a given task. Thus, the middleware approach allows the simulation/modelling toolkit to handle the identity and relationship of, and between agents and their environment. Conversely, the GIS would manage spatial features, as well as temporal and topological relationships of the model. Essentially, the simulation/modelling toolkit handles what it is designed for (i.e. implementing the model), while the GIS can be used to run the model, and visualise the output. An example of this approach is the ABM extension within ArcGIS (referred to as Agent Analyst), which allows users to create, edit, and run RepastPy models from within ArcGIS (Redlands Institute 2010). However, it is the opinion of the authors that only a dichotomy of integration classifications exists. A GIS is either integrated into a simulation/modelling toolkit, or vice versa. The definition of the middleware approach is essentially tight coupling (see above).

3 Developing Geospatial Simulations with Agent-Based Modelling Toolkits

The process of building an agent-based model begins with a conceptual model, where basic questions or goals, elements of the system (e.g. agent attributes, rules of agent interaction and behaviour, the model environment, etc.), and the measurable outcomes of interest are identified (Brown 2006). It is important to ‘ground’ a model during the conceptualisation process (i.e. establish whether simplifications made during the design process do not seriously detract from the credibility and likelihood that the model will provide important insights; Carley 1996). It is usual for a modeller to set forth a claim as to why the proposed model is reasonable. This claim will be enhanced if the applicability of the model is not over stated, and by defining the models limitations and scope. Grounding can be reinforced by demonstrating that other researchers have made similar or identical assumptions in their models, and by justifying how a proposed model will be of benefit in relation to pre-existing models.

Conceptualising the fundamental aspects of an agent-based model (i.e. one or more agents interacting within an environment), juxtaposed with the distinction between explanatory vs. predictive purposes of a model suggests a fourfold typology of agent and environment types (Table 12.2). Couclelis (2001) classifies agents and their environment as either being designed (i.e. explanatory) or analysed (i.e. predictive – empirically grounded). If designed, agents are endowed with attributes and behaviours that represent (often simplified) conditions for testing specific hypotheses about general cases. Analysed agents are intended to accurately mimic real-world entities, based on empirical data or ad hoc values that are realistic substitutes for observed processes. Similarly, the environment that agents are situated within can be designed (i.e. provided with characteristics that are simplified to focus on specific agent attributes), or analysed (i.e. represent a real-world location).

Table 12.2 Description, purpose/intent, verification and validation strategies, and appropriate development tools for agent-based models incorporating designed or analysed agents/environments (Adapted from Berger and Parker 2001)

The boundary between designed and analyzed is not always distinct, especially when ad hoc data are employed. Subtle but profound differences, both practical and conceptual, exist between the design or analysis approach of developing agents and their environment. A major difference in practical terms is that designing something provides direct (partial or total) control over the outcome, whereas there can only be hope that something has been analyzed correctly (Couclelis 2001). Table 12.2 provides further details to consider when developing agents and their environment; including a brief description of the model, the purpose and intent of the model (see Parker et al. 2001), verification and validation strategies used to assess the model outputs (see Parker et al. 2001; Crooks et al. 2008), and appropriate software for the development of a model (see Sect. 12.3.2).

Once a model has been conceptualised, it must be formalised into a specification which can be developed into a computer programme (Grimm and Railsback 2012 and Abdou et al. 2012 offer constructive advice on this); if the model is required to be run as a computer simulation. The process of formalisation involves being precise about what an identified theory relating to a phenomena of interest means, making sure that it is complete and coherent. There are several reasons why computer simulation is more appropriate for formalising social science theories than mathematics, which has often been used in the social sciences (Gilbert and Troitzsch 2005). First, programming languages are more expressive and less abstract than most mathematical techniques. Second, a computer simulation can deal more easily with parallel process and processes without well defined order or actions than systems of mathematical equations. Third, a computer model can include heterogeneous agents (e.g. pedestrians with varying degrees of knowledge about a building layout), while this is usually relatively difficult using mathematics. Finally, computer programmes are (or can easily be made to be) modular, so that major changes can be made to one part of the model without requiring large changes in other parts of the programme, an ability which mathematical systems often lack.

The object-oriented paradigm provides a very suitable medium for the development of agent-based models. In particular, it provides the aforementioned modularity useful for developing a computer simulation. It is not the intention of this chapter to outline the fundamental object-oriented concepts, this has been achieved by numerous others (refer to Booch (1994) for a seminal discussion and Armstrong (2006) for a useful evaluation and clarification of key object-oriented notions).

At the time of writing, there are many simulation/modelling systems available to assist the development stage of ABM. The majority of these simulation/modelling systems are programmed, and/or require the user to develop their model in an object-oriented language. The subsequent section of this chapter identifies some of the simulation/modelling systems available for ABM, highlighting key questions that should be considered for a user to determine an appropriate system for their needs.

3.1 Types of Simulation/Modelling Systems for Agent-Based Modelling

In general, two types of simulation/modelling systems are available to develop agent-based models: toolkits or software.Footnote 2 Based on this dichotomy, toolkits are simulation/modelling systems that provide a conceptual framework for organising and designing agent-based models. They provide appropriate librariesFootnote 3 of software functionality that include pre-defined routines/functions specifically designed for ABM. However, the object-oriented paradigm allows the integration of additional functionality from libraries not provided by the simulation/modelling toolkit, extending the capabilities of these toolkits. Of particular interest to this chapter is the integration of functionality from GIS software libraries (e.g. OpenMap, GeoTools, ESRI’s ArcGIS, etc.), which provide ABM toolkits with greater data management and spatial analytical capabilities required for geospatial modelling (see Sect. 12.2).

The development of agent-based models can be greatly facilitated by the utilisation of simulation/modelling toolkits. They provide reliable templates for the design, implementation and visualisation of agent-based models, allowing modellers to focus on research (i.e. building models), rather than building fundamental tools necessary to run a computer simulation (see Tobias and Hofmann 2004; Railsback et al. 2006). In particular, the use of toolkits can reduce the burden modellers face programming parts of a simulation that are not content-specific (e.g. GUI, data import-export, visualisation/display of the model). It also increases the reliability and efficiency of the model, because complex parts have been created and optimised by professional developers, as standardised simulation/modelling functions. Unsurprisingly, there are limitations of using simulation/modelling systems to develop agent-based models, for example: a substantial amount of effort is required to understand how to design and implement a model in some toolkits; the programming code of demonstration models or models produced by other researchers can be difficult to understand or apply to another purpose; a modeller will have to learn or already have an understanding of the programming language required to use the toolkit; and finally the desired/required functionality may not be present, although additional tools might be available from the user community or from other software libraries. Benenson et al. (2005) also note that toolkit users are accompanied by the fear of discovering that a particular function cannot be used, will conflict, or is incompatible with another part of the model late in the development process.

Probably the earliest and most prominent toolkit was SWARM, although many other toolkits now exist. At the time of writing there are more than 100 toolkits available for ABM (see AgentLink 2007; SwarmWiki 2010; Nikolai and Madey 2009; Tesfatsion 2010; Wikipedia 2010 for comprehensive listings). However, variation between toolkits can be considerable. For example, their purpose (some toolkits have different design objectives e.g. Artificial Intelligence (AI) rather than social science focus, or network opposed to raster or vector model environments), level of development (e.g. some models are no longer supported or have ceased development), and modelling capabilities (e.g. the number of agents that can be modelled, degree of interaction between agents) can vary. A review of all toolkits currently available is beyond the scope of this chapter. However, we identify a selection of noteworthy simulation/modelling toolkits (e.g. Swarm, MASON, Repast, AnyLogic), highlighting their purpose and capabilities, as well as resources providing further information.

In addition to toolkits, software is available for developing agent-based models, which can simplify the implementation process. For example, simulation/modelling software often negates the need to develop an agent-based model via a low-level a programming language (e.g. Java, C++, Visual Basic, etc.). In particular, software for ABM is useful for the rapid development of basic or prototype models. However, modellers using software are restricted to the design framework advocated by the software. For instance, some ABM software will only have limited environments (e.g. raster only) in which to model, or agent neighbourhoods might be restricted in size (e.g. von Neumann or Moore). Furthermore, a modeller will be constrained to the functionality provided by the software (unlike ABM toolkits modellers will be unable to extend or integrate additional tools), especially if the toolkit is written in its own programming language (e.g. NetLogo). Section 12.3.3 identifies a selection of noteworthy software for the development of agent-based models; StarLogo, its derivative NetLogo, and AgentSheets.

3.2 Guidelines for Choosing a Simulation/Modelling System

Ideally, a modeller would have comprehensive practical experience in a range of modelling/simulation systems before choosing which system to use for a modelling endeavour. Unfortunately, this is not usually feasible. For this reason several authors (Najlis et al. 2001; Gilbert and Bankes 2002; Serenko and Detlor 2002; Tobias and Hofmann 2004; Dugdale 2004; Rixon et al. 2005; Robertson 2005; Andrade et al. 2008; Berryman 2008; Liebert et al. 2008; Nikolai and Madey 2009) have gained practical experience and/or have surveyed several systems, identifying key criteria that should be considered before making a decision. General criteria include, but are not limited to: ease of developing the model/using the system; size of the community using the system; availability of help or support (most probably from the user community); size of the community familiar with the programming language in which the system is implemented (if a programming language is necessary to implement the model); is the system still maintained and/or updated; availability of demonstration or template models; technical and how-to documentation, etc. Criteria relating specifically to a systems modelling functionality include: number of agents that can be modelled; degree of interaction between agents; ability to represent multiple organisational/hierarchical levels of agents; variety of model environments available (network, raster, and vector); possible topological relationship between agents; management of spatial relationships between agents, and agents with their environment; mechanisms for scheduling and sequencing events, etc. These criteria will be weighted differently depending on a modeller’s personal preferences and abilities (e.g. the specification of the model to be developed, programming experience/knowledge, etc.).

Another important distinction separating simulation/modelling systems is there licensing policy; open source, shareware/freeware, or proprietary. Open source simulation/modelling systems constitute toolkits or software whose source code is published and made available to the public, enabling anyone to copy, modify and redistribute the system without paying royalties or fees. A key advantage of open source simulation/modelling systems relates to the transparency of their inner workings. The user can explore the source code, permitting the modification, extension and correction of the system if necessary. This is particularly useful for verifying a model (see Crooks et al. 2008). The predominant open source simulation/modelling systems are toolkits (e.g. MASON, Repast, Swarm, etc.). The distinction between an open source simulation/modelling system and a shareware/freeware system is subtle. There is no one accepted definition of the term shareware/freeware, but the expression is commonly used to describe a system that can be redistributed but not modified, primarily because the source code is unavailable. Consequently, shareware/freeware systems (e.g. StarLogo, NetLogo, etc.) do not have the same flexibility, extendibility or potential for verification (in relation to access to their source code), as open source systems. Similarly, shareware/freeware systems tend to be toolkits, rather than software.Footnote 4 Finally, proprietary simulation/modelling systems are available for developing agent-based models. Proprietary systems are mainly software, developed by an organisation who exercises control over its distribution and use; most require a licence at a financial cost to the user. These systems have the advantage of being professionally designed and built for a specific use, and are often relatively simple to use. However, they often lack the community support found with open source or shareware/freeware systems. Moreover, since access to their source code is prohibited, a model developed with proprietary software is essentially black box. A modeller will therefore, to some extent, be left unsure about the inner validity of a model constructed with a proprietary system. This situation is compounded when the output of a model is emergent or unexpected.

Striking a balance between the aforementioned criteria is difficult. Unfortunately, while identifying a suitable system for the development of an agent-based model, too much time can often be expended trying to find this balance. This balance can be perceived as a trade off between the difficulty of developing a model (e.g. in terms of time required to programme the model, understand how to develop a model with a specific system, or acquiring experience and knowledge of a programming language if required, etc.), versus the modelling power provided by the simulation/modelling system (e.g. modelling capabilities and functionality, Fig. 12.1). The key is striking a ‘personal’ balance between these criteria. For example, those more accustomed to programming may prefer the functionality and flexibility of a simulation/modelling toolkit. However, modellers that only wish to develop a basic or prototype model quickly and easily, possibly with little or no programming skills may prefer to use simulation/modelling software (see Railsback et al. 2006).

Fig. 12.1
figure 1_12

Balance between power versus difficulty of developing a model with a simulation/modelling system

3.3 Simulation/Modelling Systems for Agent-Based Modelling

This section provides key criteria pertaining to a selection of simulation/modelling systems available for the development of agent-based models (the rationale for each criterion was described in Sect. 12.3.2). Although there are many systems available for developing agent-based models, this chapter reviews seven, separated into three categories of licensing policy (1) open source (Swarm, MASON and Repast); (2) shareware/freeware (StarLogo and NetLogo); and (3) proprietary systems (AgentSheets and AnyLogic). These systems were chosen because they fulfilled the (majority of the) following criteria, they are: maintained and still being developed; widely used and supported by a strong user community; accompanied by a variety of demonstration models and in some instances the model’s programming script or source code is available; and finally they are capable of developing spatially explicit models, possibly via the integration of GIS functionality. Tables 12.312.5 tabularise information of each system for comparison purposes; categorised by their licensing policy (adapted from Najlis et al. 2001 and Parker 2001). The reminder of this section provides further information about each system, identifying examples of geospatial models that have been developed with the system. A caveat must be noted at this point, the information provided within this section is accurate at the time of publication. However, the systems reviewed are constantly being updated, thus modellers are advised to check each systems website to obtain up to date information.

Table 12.3 Comparison of open source simulation/modelling systems (Adapted from Najlis et al. 2001 and Parker 2001)

3.3.1 Swarm

Swarm (Table 12.3) is an open source simulation/modelling system designed specifically for the development of multi-agent simulations of complex adaptive systems (Swarm 2010); although agent-based models can easily be develop using Swarm as well. Inspired by artificial life, Swarm was designed to study biological systems; attempting to infer mechanisms observable in biological phenomena (Minar et al. 1996). In addition to modelling biological systems (e.g. Railsback and Harvey 2002), Swarm has been used to develop models for anthropological, computer science, ecological, economic, geographical, and political science purposes. Useful examples of spatially explicit models include: the simulation of pedestrians in the urban centres (Schelhorn et al. 1999 and Haklay et al. 2001); and the examination of crowd congestion at London’s Notting Hill carnival (Batty et al. 2003). Najlis et al. (2001) identify the steep learning curve of Swarm as a significant factor to consider before choosing this system to develop an agent-based model; although this should be less of a problem for a modeller with strong programming skills.

3.3.2 MASON

MASON (Multi Agent Simulation Of Neighbourhood – Table 12.3) is developed by the Evolutionary Computation Laboratory (ECLab) and the Centre for Social Complexity at George Mason University (see Luke et al. 2005). Currently MASON provides much of the same functionality as Repast, for example, dynamically charting (e.g. histograms, line graphs, etc.) and model output during a simulation. A recent addition to MASON is GeoMASON (2010) which allows GIS vector data to be imported/exported. In addition MASON also supports the use of raster data in the creation of geospatial agent-based models (e.g. Kennedy et al. 2010) as shown in Fig. 12.2.

Fig. 12.2
figure 2_12

Examples of raster and vector agent-based models in MASON. (a) Agents are red points which move around the footpaths (Blue Lines). (b) A rainfall model where agents are blue and flow down the Terrain (Built from a Digital Elevation Model)

MASON has a growing set of technical documents and well commented Javadocs and a user group which is actively supports its e-mail list. MASONs how-to documentation, demonstration models (e.g. the seminal heat bugs example, network models, etc.), and several publications detailing the implementation and/or application of MASON are available for a prospective modeller to evaluate the system further (MASON 2010). Examples of spatially explicit models utilizing MASONs GIS functionally include exploring conflict between herdsmen and farmers in East Africa (Kennedy et al. 2010), pastoralists in Inner Asia (Cioffi-Revilla et al. 2010), residential dynamics in Arlington County, Virginia (Hailegiorgis 2010) and understanding the Afghan drug industry (Łatek et al. 2010).

3.3.3 Repast

Repast (Recursive Porous Agent Simulation Toolkit – Table 12.3) was originally developed at the University of Chicago, and is currently maintained by Argonne National Laboratory and managed by the Repast Organisation for Architecture and Development (ROAD). Earlier incarnations of Repast catered for the implementation of models in three programming languages: Python (RepastPy); Java (RepastJ and Repast Simphony); and Microsoft.Net (Repast.Net). RepastPy allows basic models to be developed by modellers with limited programming experience via a ‘point-and-click’ GUI (Collier and North 2005). RepastPy models can subsequently be exported/converted into Java for further development in RepastJ. Repast.Net and RepastJ allow for more advanced models to be developed (Vos 2005), because more complex functionality can be programmed into a model. Agent Analyst is an ABM extension that allows users to create, edit, and run Repast models from within ArcGIS (Redlands Institute 2010). For further information of earlier versions of Repast, readers are referred to Crooks (2007). Repast has a relatively large user group and an actively supported e-mail list, as well as extensive how-to documentation and demonstration models available from the system website.

Whilst still being maintained RepastJ, Repast.Net and RepastPy have now reached maturity and are no longer being developed. They have been superseded by Repast Simphony (RepastS), which provides all the core functionality of RepastJ or Repast.Net, although limited to implementation in Java. For a comparison of RepastS and previous versions readers are referred to North and Macal (2009). RepastS was initially released in late 2006 and now provides the same GIS functionality of previous versions. The main improvements with RepastS over Repast 3.0 is a new optional GUI point-and-click environment for model development that generates Java classes, however models can still be coded manually. Secondly a improved runtime GUI, the GUI can now be used to build displays (both in 2 and 3D) or charts, output data, interrogate agents, and interface with other programs (like R for statistics) via a point-and-click interface at run time. This means that these tasks are done more quickly after the model has been built and compiled, and do not feature in the underlying code at all, unlike previous Repast implementations.

The Repast development team have provided a series of articles regarding RepastS. The architecture and core functionality are introduced by North et al. (2005a), and the development environment is discussed by Howe et al. (2006). The storage, display and behaviour/interaction of agents, as well as features for data analysis (i.e. via the integration of the R statistics package) and presentation of models within Repast S are outlined by North et al. (2005b). Tatara et al. (2006) provide a detailed discussion outlining how-to develop a “simple wolf-sheep predation” model; illustrating RepastS modelling capabilities. In relation to the integration of GIS functionality the reader is referred to the tutorials by Malleson, (2008) which demonstrates how to create a virtual city via the importation of shapefiles, create agents and then move the agents around a road network (this tutorial was used for the creation of Fig. 12.3a). Furthermore, within RepastS it is possible to embed spatially explicit agent-based models directly into a 3D GIS display. For this RepastS provides methods to directly visualise agent-based models to NASA’s (2010) virtual globe – World Wind. This new interactive 3D GIS display allows one to visualise agents with satellite imagery, elevated terrain and other scientific datasets as shown in Fig. 12.3b. RepastS also supports the importation of NetLogo (see Sect. 12.3.3.5) models into the Repast framework via ReLogo (Ozik 2010). Such functionality aims to allow for rapid prototyping of agent-based models by first building simple agent-based models in NetLogo and once satisfied allowing one to migrate and extend them in RepastS. Not only does RepastS provide tools for the conversion of simple models from NetLogo, it also supports high performance distributed computing, via Repast for High Performance Computing (Repast HPC, see Collier 2010).

Fig. 12.3
figure 3_12

Examples of vector agent-based models in RepastS. (a) Agents (Red Dots) moving about on footpaths (Grey Lines). (b) An agent-based model overlaid on NASA world wind (Source: Repast 2011)

Useful examples of spatially explicit models created using Repast include the studying of segregation, and residential and firm location (Crooks 2006, 2010), residential dynamics (Jackson et al. 2008) crime (Malleson et al. 2010) and the evacuation of pedestrians from within an underground station (Castle 2007).

3.3.4 StarLogo

StarLogo (Table 12.4) is an shareware/freeware modelling system developed at the Media Laboratory, Massachusetts Institute of Technology (MIT) It has undergone some change, the original StarLogo modelling system has been released as an open source project (see OpenStarLogo 2010) however, it is still included in this section as the new version, StarLogo TNG (The New Generation) is still shareware/freeware. StarLogo TNG moves StarLogo from the 2D to the 3D realm through the use OpenGL graphics API and aims to lower the barrier for programming agent-based models through the use of a drag and drop programming graphical interface. Modellers can drag commands from a set of model building blocks (a block based graphical language) rather than creating models using the StarLogo syntax thus allowing for rapid model development. StarLogo TNG uses OpenGL for displaying the models at run time therefore providing a 3D display termed ‘SpaceLand’. The terrain within such models is editable and can be manually shaped. Agents can also be programmed to move in x, y and z directions.

Table 12.4 Comparison of shareware/freeware simulation/modelling systems (Adapted from Najlis et al. 2001 and Parker 2001)

StarLogo lacks the same flexibility offered by open source systems, since modellers are constrained to functionality provided by the system. Despite this limitation, StarLogo is very easy to use, notably for people with very little programming experience. Dynamic charting functionality of model output during a simulation is provided. In addition, a number of demonstration models and detailed how-to documentation relating to these models is supplied with StarLogo, and many more are available to download from the World Wide Web (WWW). While StarLogo does not support GIS per se, it does allow one to import GIFs, therefore allow pixels to be converted into patches. Batty et al. (1998) used this approach to examine visitor movement within London’s British Tate Gallery, specifically how changes in room configuration can affect movement between exhibits.

3.3.5 NetLogo

NetLogo (originally named StarLogoT – Table 12.4) is a variant of StarLogo, originally developed at the Centre for Connected Learning and Computer-Based Modelling at Northwestern University, to allow StarLogo models to be developed on computers using the Macintosh operating system. It is now possible to create StarLogo models on a computer using a Macintosh operating system, thus the critically distinction between the two simulation/modelling systems is that NetLogo is specifically designed for the deployment of models over the internet (NetLogo 2010). Initially both NetLogo and StarLogo only provided functionality to import image files, which can be used to define the environments within which agents are located, thus facilitating the development of spatial models (Fig. 12.4). However, within NetLogo it is now possible to import both raster (in the form of .asc files) and vector data (shapefiles). This new ability opens up a range of possibilities for the easy creation of spatial agent based models. For example, for the studying of surface erosion (Wilensky 2006) as shown in Fig. 12.4b.

Fig. 12.4
figure 4_12

Example of GIS integration in NetLogo. (a) Demonstration model of using point, line and polygon shapefiles for creating a landscape. (b) NetLogo’s gradient example and (c) the cruising model where cars move along the roads (Red lines) (Source: NetLogo 2010)

The NetLogo installation comes with two demonstration models highlighting this functionality. For vector data, four different GIS datasets: a point file of world cities, a polyline file of world rivers, a polygon file of countries (however there is no way to distinguish if the polygon has holes in it) are imported into a NetLogo model and converted into patches as shown in Fig. 12.4a. For the raster example, a raster file of surface elevation is loaded into a NetLogo model to demonstrate the possibilities of working with spatial data as shown in Fig. 12.4b. In this example, Agents follow the surface to lower elevations. Such functionality potentially lowers the barrier between coupling agent-based models and GIS to none expert programmers. For example, the gradient example presented above could be used to model process that relies on cost surfaces such as emergency evacuation of buildings (see Crooks et al. 2008, for an example). As with StarLogo TNG (Sect. 12.3.3.4), models within NetLogo can be viewed in a 3D environment however unlike StarLogo TNG it is only the agents that appear in 3D while the surface remains a 2D plane.

NetLogo has been used to develop applications in disciplines varying from biology and physics to the social sciences. Extensive how-to documentation/tutorials and demonstration models are available from the system website, and functionality can be extended through APIs, although the source code for the system is currently unavailable. Useful examples of spatially explicit models created using NetLogo include the study of gentrification (Torrens and Nara 2007), residential housing demand (Fontaine and Rounsevell 2009) and the emergence of settlement patterns (Graham and Steiner 2006) and the reimplementation of Axtell et al. (2002) artificial Anasazi model by Janssen (2009).

3.3.6 AgentSheets

AgentSheets (Table 12.5) is a proprietary simulation/modelling system that allows modellers with limited programming experience to develop an agent-based model, because models are developed through a GUI (Repenning et al. 2000). A number of demonstration models are available from the system website. For example, Sustainopolis is a simulation analogous to the computer game SimCity; exploring pollution dispersal within a city (Fig. 12.5). Furthermore, AgentSheets can be linked to real time information over the internet (Repenning and Ioannidou 2004). For example, AgentSheets has been used in conjunction with real time weather feeds and used to make mountain biking recommendations in Boulder County. Within the model, agents represent locations that are possible candidates for biking featuring real time, web accessible weather sensors. This information is then used by the biker to reach a decision on where to go biking. Carvalho (2000) has used AgentSheets extensively to teach undergraduate students. He comments that it is easy to use the system to develop models quickly and provides students with hands-on experience of ABM without the need to learn a programming language. However, he also found that models created with AgentSheets were limited in their sophistication (notably in terms of the complexity of representation of agent behaviour and interaction). Furthermore, agents are limited to movement within a two-dimensional cell-based environment.

Table 12.5 Comparison of proprietary simulation/modelling systems (Adapted from Najlis et al. 2001 and Parker 2001)
Fig. 12.5
figure 5_12

The Sustainopolis model developed in AgentSheets (2010)

3.3.7 AnyLogic

AnyLogic (Table 12.5) incorporates a range of functionality for the development of agent-based models. For example, models can dynamically read and write data to spreadsheets or databases during a simulation run, as well as dynamically chart model output. Furthermore, external programmes can be initiated from within an AnyLogic model for dynamic communication of information, and vice versa. However, AnyLogic models can only be created on Microsoft operating systems, although a simulation can be run on any Java-enabled operating system once compiled (e.g. a Macintosh operating system). The AnyLogic website notes that models have been developed for a diverse range of applications including: the study of social, urban (Fig. 12.6) and ecosystem dynamics (e.g. a predator-prey system); planning of healthcare schemes (e.g. the impact of safe syringe usage on HIV diffusion); computer and telecommunication networks (e.g. the placement of cellular phone base stations); and the location of emergency services and call centres. Further information pertaining AnyLogic modelling applications can be found in Parinov (2007), these include imitating the functioning of a emergency department in a large hospital. However, the source code of these examples and/or documentation of these models is unavailable. Example applications utilizing AnyLogic for spatial agent-based modelling include: Makarov et al. (2008) who studied traffic jams in Moscow and explored different scenarios for reducing such events either by road pricing or new road building. Johnson and Sieber (2009) used AnyLogic to explore tourism in Nova Scotia, while Pint et al. (2010) used AnyLogic to explore organised crime in Rio’s favelas.

Fig. 12.6
figure 6_12

An urban and transport dynamics model developed in AnyLogic (2010)

4 Summary

This chapter has reviewed the current capabilities of modelling within a GIS and suggests that agent-based modellers interested in developing geospatial models involving many (possibly tens of thousands) interacting agents with complex behaviours and interactions between themselves, and their environment should consider either GIS-centric or modelling-centric integration. Moreover, we have discussed considerations one should take when thinking about utilizing an agent-based simulation/modelling system. Furthermore, we have outlined a selection of simulation/modelling systems which can be used for the creation of geospatial agent-based models along with providing examples of applications.

Each of simulation/modelling systems discussed within this chapter can be positioned within the continuum illustrated in Fig. 12.1 (power versus difficulty of developing a model with a simulation/modelling system). However, the exact location of each system is very subjective (i.e. dependant upon a modeller’s knowledge and experience of ABM in general, and each simulation/modelling system in particular). The information presented within this chapter is aimed at providing the reader with a selection of useful criteria to assess the seven simulation/modelling systems presented, allowing each system to be (approximately) located within this continuum based on the readers own knowledge and experience. That is not to say that the selection criteria cannot be utilized for other simulation/modelling systems and once a candidate system(s) has been identified the reader will need to investigate the potential suitable of the system(s) further.

However, it needs to be noted that while such tools exist, integrating GIS data for ABM is still a difficult process (Gilbert 2007) and many considerations are needed such as what data is needed, how should the data be utilised, how should agents interact with the data, etc. Nevertheless, such systems lower the entry level needed to create geospatial agent-based models and thus allowing a greater number of social scientists to create geospatial agent-based models. One note of caution however is needed, that is there is still a computational challenge when it comes to the creation of geospatial agent-based models with thousands of agents operating and interacting with raster or vector features (see Kennedy et al. 2009 for a discussion) but over time this should be reduced with increased computational power.