Keywords

1 Introduction

With the “Energiewende”, the German electricity system is currently experiencing a fundamental transition. The transition is characterized by enhanced feeding-in of renewable energies, integration of new smart technologies, and continuing (digital) integration of infrastructures that were nearly totally separated in former times. This trend fosters innovations that are not just limited to the design of daily life procedures, the operations of networks, and the building of groundbreaking market structures. The fundamental transition also has high potentials to enhance urban resilience by a better management of service disruptions of so-called Critical Infrastructures (CIs). CIs provide vital services and thereby have high influence on the well-being of a population. It is of great public interest to keep CI services continuous or to fast recover them after a disruption in particular during and directly after disasters.

The organizational and technical transformation may allow novel coping strategy options, tactical respond advantages, and measures to manage emergency situations in more effective ways. However, it is still difficult to bring these potentials into practice. Therefore, we are developing a multi-agent-based model in which CI entities are described as autonomous agents that interact with each other and represent the CI services of a city or county under consideration. The multi-agent-based modelling approach should provide grid providers, CI utilities, and disaster management authorities with a decision support in how to beneficially use smart metre technologies for building urban resilience. At the same time, the approach should allow an enhanced understanding of the functional capabilities of interlaced CI services and of the onset of cascading effects. Therefore, simulating different kinds of disruptive events should allow comparisons of measure and strategy benefits that aim at a solid preservation of CI services.

To the best of our knowledge, there is still no sustainable design in the literature of how the agents interact and solve the problems caused by CI service shortages.

In this chapter, we introduce and discuss the foundation of an agent-based system for the purpose of building urban resilience through a decentralized and agent-autonomous coordination of CI services in a city during an emergency situation. The chapter consists of two sections. The first section is an introduction to the specification of decision-making. It comprises the definition of (urban) local CIs and urbanity and an accompanied discussion about the management to respond to CI disruptions. The second section addresses the development of the multi-agent-based simulation. This includes a discussion of the embedding of the agent-based simulation in the management procedures and the determining of agents in the context of CI protection and urban resilience. Furthermore, the agent-based distributed decision-making is discussed. The decision-making is based on the internal state of a CI entity which enables to determine the necessary resources to sustain a certain performance or to realize a sensible distribution of limited resources. To provide insights in the functionality and the implementation of the simulation, we discuss the modelling of the internal states of CI entities, the determining of necessary CI services, and the sensible distribution of remaining CI services. The section is concluded by a discussion of the advantages of multi-agent-based approach. The chapter is closed by a summary.

2 Urban Resilience and Critical Infrastructure Protection

In this section, we provide the underlying understanding about urban resilience and local CI entities which is the fundamental basis for the development of a multi-agent-based simulation for decision support. In the following, a brief discussion is conducted addressing the definition of CIs and urban resilience and the responds to CI disruptions.

2.1 Definition of (Urban) Local Critical Infrastructures

CI functions such as the supply of electricity, drinking water, and health care are essential basic structures and provide vital services to the population. Disruptions or failures of these services are hazardous and can lead to injuries or even losses of life, damage of property, social and economic disruptions, or environmental degradations (United Nations International Strategy for Disaster Risk Reduction 2015).

In accordance with the EU Council Directive 2008/114/EC on a common approach for the identification and designation of European Critical Infrastructures (ECI), there is a commonly used but not legally defined list of CI sectors and branches in Germany (Bundesministerium des Innern 2011b). It is reflected in many CI protection policies and mainly represents large-scale/wide-area CIs and supra-regional networks that are of national relevance (e.g. electricity transmission grids, cargo traffic, medical supply, etc.). However, this list is too coarse and not suited for the application in urban resilience. The urban resilience perspective requires a CI understanding of typical and concrete facilities that can be found in the majority of cities. Only a few large-scale/wide-area CIs or some components of them can be found in some cities. They are not representing the average local CIs of cities like hospitals, dialysis clinics, or pharmacies.

Usually, local CIs can be found in varying number and size in every city. However, a comprehensive and prevailing list of concrete facilities of local CIs is still missing in Germany. Many local disaster management authorities have compiled so-called local CI Cadasters or Land Registers (“KRITIS-Kataster”) about the local CI entities that are situated in their area of responsibility with relevant information (e.g. contact person, contact details, location, size, hazardous properties, emergency backup power capability, storage capacity (e.g. for food, consumables, drinking water, drugs), fuel tank capacity, emergency power infeed capability). These are often the only available documents about the individual characteristics of a city’s local CI entities.

CI Cadasters are regularly kept as “living” documents which require periodic, cooperative, and interactive information exchange between the local disaster management authorities and the local CI entities. In Germany, this is often ensured by the establishment of CI protection partnerships which are promoted and led by the local authorities. The purpose of these kinds of cooperation is inter alia the creation of a common understanding and comprehensive treatment of risks and hence an improvement of urban resilience.

2.2 Urbanity and Local Critical Infrastructures

In the literature, many definitions can be found for the term urban resilience, e.g. (Meerow et al. 2016; Leichenko 2011; Chelleri 2012; Bhamra et al. 2011). An inclusive and flexible definition of urban resilience is exemplarily provided by (Meerow et al. 2016) who define urban resilience as “the ability of an urban system […] to maintain or rapidly return to desired functions in the face of a disturbance, to adapt, to change, and to quickly transform systems that limit current or future adaptive capacity”. In the context of this chapter, we particularly focus on the ability of a system of local CIs to cope with the consequences of basic service disruptions. This initially addresses short-term resilience including a continued supply or fast restoration of disrupted services.

The “urban” understanding in this chapter addresses the public administrative responsibilities for crisis management according to the German crisis management system. In Germany, the independent cities and counties [the third administrative divisions according to the Classification of Territorial Units for Statistics (NUTS 3 level)] are operatively responsible to cope with any kind of crisis and disaster situation. For this purpose, they have appointed so-called crisis management teams which are in charge to coordinate all preparation and respond activities. Although some counties may also have urban character, we primary understand independent cities as “urban”. This is also in accordance to the German regional planning policy which is following the central place theory. In this context, administrative divisions are distinguished regarding their infrastructural range of basic services. They are distinguished as centres of lower-order, middle-order, and higher-order. Historically based, an independent city had outstanding regional roles which mostly fostered the organic growth to a hub of local CIs. Hence, independent cities are almost exclusively determined as centres of higher-order which accurately captures our understanding of “urban”.

2.3 Responding to Critical Infrastructure Disruptions

The response to a CI disruption comprises multiple challenges for disaster management authorities and CI providers. First, there is the nature of a city that is often understood as network of complex systems (Cruz et al. 2013; da Silva et al. 2012; Lhomme et al. 2012; Desouza and Flanery 2013). The functions of an individual local CI are complex themselves and can only be vaguely estimated by disaster management authorities. Furthermore, to conduct a CI service, the function of other basis structures is required. These interdependencies between different types of CI services are difficult to evaluate. Due to the interdependencies, it is possible that CI disruptions propagate through a CI system and escalate as cascading effects. This corresponds to the toppling domino theory that assumes an initiating CI disruption starts a sequence of additional disruptions on other CIs (Luiijf et al. 2009; van Eeten et al. 2011; Kadri et al. 2014; Pescaroli and Alexander 2016). From an empirical perspective, there is an overwhelming majority of cascading effects that are caused by disruptions in the energy and telecom sectors (Luiijf et al. 2009; van Eeten et al. 2011). This is not surprising as many analyses have shown that most of the CIs depend on electricity as well as on the information and communication infrastructures (Laugé et al. 2015; Stergiopoulos et al. 2016; Setola et al. 2009; Buldyrev et al. 2010; Kunz et al. 2013; Luiijf et al. 2009; van Eeten et al. 2011; Blake et al. 2013).

To obtain a clearer understanding of the role of interdependencies in network (large-scale) CI systems, dozens of interdependency models and simulations were developed in the last decades, e.g. (Ouyang 2014; Pederson et al. 2006; Yusta et al. 2011; Theoharidou et al. 2011; Eusgeld et al. 2008; Giannopoulos et al. 2012). In general, these tools provide useful insights in the functionality of large-scale CIs, but many of them do not adequately consider the need for decision support of local disaster management authorities. Usually, the application of the prevailing interdependency modelling methods (like system dynamics, Bayesian, and input–output modelling) request huge detailed information about the considered networks. In practice, this information—if available at all—is distributed through the CI entities but not held by a central or single party. The collection is time-consuming, repeatedly accompanied by compliance reservations and therefore often not in proportion to the added value. As mentioned before, the only available documents about the local CI entities of a particular city are in general the CI Cadasters or Land Registers which are supervised by the disaster management authorities.

Finally, the transition of the energy supply system provides new and powerful possibilities to manage crisis situations. Such crisis situations can be caused by grid instabilities or infrastructural destructions. There are multiple reasons for grid instabilities such as imbalanced load and generation or malicious cyber attacks. Destructions of infrastructures can be caused by sudden component failures, terror attacks, human error, or natural disasters.

The application of new mechanisms due to the Energiewende transition is still not sufficiently considered by the interdependency models yet. There is still a need for appropriate simulation and analysis tools for the purposes of local disaster management (Pescaroli and Alexander 2016).

Another important challenge in coping with CI disruptions is the question on how to appropriately respond to such events. The well-ordered service reduction at night times or at bank holidays clearly shows that it is possible to close some facilities for a limited time without causing additional risks. Likewise not every magnitude of service disruption automatically leads to a life-threatening situation. One reason for this is the fact that there are mechanisms like the determination of on-call medical units and emergency pharmacies which allow to maintain a minimum CI service at certain times. Another reason is redundancies, in particular in those cases in which CI entities are in a certain degree of competition. During a disruption of some CI entities, other still operational CI entities of the same type can step in and replace the missing service by providing an additional supply to a certain extent. Such local CI entities are for instance General Practitioners and pharmacies. In other cases such as hospitals and dialysis clinics, this only applies to some extent.

Furthermore, CI entities have coping capabilities to manage the effects of a services disruption for a limited amount of demand and time. Some processes are flexible, and it is possible to reschedule, extend, or delay their performance while keeping the key business of the CI service running. Also some CI entities have implemented coping capacities such as enhanced storage, larger than necessary tanks, and emergency backup generators that enable them to continue business without external supplies for a limited amount of time. Today, such coping capabilities are only used in the case of an emergency. However, the system transition may also motivate to reflect their use to reduce demands to keep a system of CIs in a city stabilized.

The reflections about admissible reduction of CI services imply that a system of local CI entities in a city can stay—even though for only a limited duration—in multiple states of stable equilibrium in which no (additional) risks occur. From an engineering point of view, however, there is only one single state of desired equilibrium that expresses the normal or initial state before the CI disruption occurred and to which a system has to be reverted to. For a detailed discussion of equilibrium and system resilience; see, e.g. (Holling 1996) and for a review in the context of urban resilience; see (Meerow et al. 2016).

Since years the determination of levels of CI supply that reasonably have to be ensured during a disruption of basic services to avoid (further) risks is of great interest in science, CI protection policy building and regional planning in Germany (Bundesministerium des Innern 2009, 2011a, 2012; Fekete 2012; Münzberg et al. 2014). The discussions revolve mainly around the so-called “protection target level” and the “minimum level of supply”. Protection target levels (“Schutzziele”) determine desired objectives for the implementation of coping measures. They define the reasonable lowest acceptable service level of a system that should be allowed or ensured as long as a CI disruption lasts. In this way, protection target levels also imply the tipping point between a safe and an unsafe system state. Hence, these levels can be understood as a maximal risk acceptance criterion that should be determined by political and societal debates. Depending on the type of CIs, the protection target levels are related to concrete facilities or a set of CIs of the same type. Furthermore, it is also imaginable to vary the target levels according to the type of considered regional centres or the spatial and temporal character of the CI disruptions. The concept of the “minimum societally accepted level of necessary CI service supply” (“Mindestversorgung” or “Mindestversorgungskonzept”) is related to the discussion about protection target levels. It can be understood as the conceptual summary of all measures to achieve protection target levels in order to ensure the lowest reasonable level and safe stable equilibrium state of CI supply in which no (further) risks occur. In this sense, one seeks to obtain a safe shutdown state following a reactive failing-safe principle through maintaining an emergency supply. As there is still an ongoing debate about these levels, there are no common and standardized definitions, nor viable concepts for determining the levels, or solutions for the large number of legal reservations available.

Besides the minimum level of supply, it can be assumed that there are other safe equilibrium states in which a sufficient supply of basic services is ensured. To reach these desired states, a coordination between the CI providers is necessary that aims at optimal distribution solutions to satisfy CI service demands. Therefore, the effects of the CI system have to be understood and forecasted. In addition, a fast and adequate respond to system changes in a cooperative and altruistic way has to be ensured. However, the nowadays established CI protection partnerships do not fulfil these requirements. The single CI providers have only an isolated perspective on their own facilities, their facility-specific demands, and—if known at all—their safe state(s). There is only a restricted holistic systemic view referring to a sufficient supply of CI services in the city. The issues which CI facility to assist and how to upkeep business during a CI disruption are still difficult questions for CI providers and disaster management authorities.

3 Development of a Multi-agent-based Simulation

In this section, we discuss the development of a multi-agent-based simulation in accordance to the previously mentioned framework. Therefore, we first discuss how the multi-agent-based simulation could be embedded in the current and future management procedures. Secondly, we introduce the general approach of modelling agents which consists of the definition of agents, the modelling of the internal state of a CI entity, and a discussion of the advantages of this concept.

3.1 Embedding the Agent-Based Simulation in Management Procedures

At present, agent-based decision-making support has to be considered from two different views, as the agent-based model of a city’s local CI entities and the real world city infrastructures are still strictly separated. Regarding this separation, the model currently can only be applied to simulate scenarios and evaluate the outcome of measures.

Nowadays, a crisis management group would be activated in a city in an event of a severe CI disruption like a long-term power outage or a cut-off of the water supply. As mentioned before the purpose of this group is to organize measures according to the nature of the occurring event. This procedure requires the gathering of all relevant information, especially from the CI entities of the city, to define the specific needs of the structures and to determine and commit appropriate measures such as distribution of remaining resources to achieve the highest benefit for the city.

This centralized view on decision-making in case of an emergency is straightforwardly established as the decision makers are human beings. However centralized decision-making inherently contains some disadvantages like a time delay during gathering the required information from the CIs until it can be processed and until certain measures can be committed. Additionally not all required information might be available in time, or recently changed or added information that may be useful is locally available but cannot globally be considered as it is not yet part of the process plan of the centralized working decision makers.

However, the strict separation exists for historic reasons as the required smart technologies were not yet available for integration at that time. Nowadays a crisis management group may use the results of the agent-based simulation to improve their decision. With the introduction of smart grid technologies it is not unlikely in the near future for the software agents of the discussed model to become part of the CIs as an intelligent embedded system and to actively contribute to a distributed decision process in a physical world. In this case the CIs should communicate and negotiate directly with each other avoiding the error-prone communication detour via the human crisis managers. Once the agents have concluded appropriate measures, they can be communicated to and used by the human deciders for the final decision-making.

The information exchange between the agents requires a functional communication. In crisis situations this can be limited or unfeasible (for instance during cyber attacks or destructed communication lines). However, agents may also identify available and missing communication links and integrate such findings in their decision-making process. Apparently, the manipulation of smart metres and agents may have severe impacts. Therefore mechanisms have to be developed to prevent the manipulation of smart CI components and to orchestrate an appropriate reaction to such adverse states respectively malicious intrusions.

3.2 Agents in the Context of CI Protection and Urban Resilience

The use of software agents is a generic programming paradigm that can be applied to solve various problems. Many different definitions of the term software agent exist. According to (Weiss 2000) a general and widespread accepted definition of an agent is the following:

“An agent is a computer system that is situated in some environment and that is capable of autonomous action in this environment in order to achieve its delegated objectives”.

The key term in this definition is the autonomous acting of agents, thus providing a rather local and distributed approach of modelling in contrast to a monolithic and centralized view. The use of agent-based modelling is motivated by the direct transformation of city components to the software agent paradigm. The local CIs of a city are basically autonomous entities within the city environment. Different types of CIs result in different types of agents. The objectives of these entities are to provide services to the city like hospitals or to sustain themselves like households. Beginning with smart grid technology these entities start to sensor their environment and to communicate with each other to improve the distribution of resources or to establish the exchange of different services. Therefore the transfer of the city structures and their interdependencies into an agent-based model is reasonable and straightforward. Generally speaking all agents provide one or more services of specified quality at a specific price to the community.

In the following, we will synonymously refer to agents as representatives for specific CIs and vice versa as we pretend that agents will be fully integrated into most of the CIs respectively consumer and producer entities of a city in the future even if this is not done yet. Fortunately the implementation and evaluation of the required methods are independently possible on existing computer systems.

3.3 Agent-Based Distributed Decision Making

In contrast to centralized decision-making, an agent-based model of a network of intelligent entities encourages a distributed decision-making process. In principle and regarding real-life processes, the decisions are made by human beings in crisis management groups who make the final decision . Though distributed decision-making comes not naturally to humans and is at first difficult to understand, it provides some notable advantages: the agents representing the CIs possess all the necessary information on the state and requirements of the CI they model. No information transfer from structure to human being is required. Information transfer between agents is much faster and less error prone. Agents can consider every bit of currently available information even recently added or specific to their instance. Last but not least the upcoming smart grid technologies will install such intelligent entities and thus implicitly encourage a distributed decision-making process in the physical world. Still it is important to stress that even in the case that intelligent agents will autonomously suggest high-quality measures the final decision has to be made by human beings.

The very nature of agents lies in their autonomy. In a distributed decision-making process agents should autonomously react to changes in their environment, i.e. the city state and plan their future acting accordingly. In addition to the knowledge of their individual demands like, e.g. a pharmacy agent knows the average amount of cooled medicaments it has to provide under certain circumstances, the general following requirements must be fulfilled:

  • The needs of an agent are not automatically identical to the needs of the city. Therefore the current needs of the city must be available to the agents in a way for them to understand it. This information is basically the request the crisis management group would face. Though the desired process of decision-making is distributed, the global needs of a city are rather centralized defined (beforehand) and have to be accessible. For simplification and without loss of generality it is possible to introduce a “city agent” who provides the necessary information like, e.g. “city district X should never be shutdown” in a data storage. For convenience other global information like, e.g. “42 people require permanent dialysis treatment” can be stored also, even if most of this information originates in the agents themselves and could be generated on demand. This kind of knowledge must be available to all agents. However its technical realization, e.g. if this information is really only centralized available or replicated to some or even all agents has no influence on the decision-making process.

  • Agents should behave rational and altruistic. In practice this can differ because, for instance, some CI provider may be in competition with other providers and may try to misuse an emergency situation for their individual benefit. At the current stage, we assume similar behaviours of CI entities if they are from the same type without considering a wide range of variations. In future, the agents behaviour settings may vary based on a customized adaption. That should also include the provider’s preferences probably characterized by non-altruistic interests. As in decentralized decision making, the individual CI providers preserve the sovereignty of information in particular about their internal state and its development during the duration of an outage situation. A non-altruistic agent respectively provider may misuse this asymmetry of information. This corresponds to the principal-agent problem in which CI providers use their information advantage to reach a better market outcome without considering the overall risk potential for the population.

  • Agents react and interact on their environment and with other agents as they are permanently negotiating about services and planning their joint actions. Even though it may be technically challenging, we assume that measuring and communication are not interrupted even during a general power blackout. On a side note a disruption of communication would also hinder a centralized decision approach. If the direct, automated or digitalized communication is broken no up-to-date information is available to a centralized crisis management group either. In this case centralized decision-making by the crisis management group is always the valid fallback solution, yet information has to be physically transferred to them by, e.g. motorbike messengers. An additional ad hoc deployable interface between cut-off analogue infrastructures and the digital smart grid would also be a worthwhile idea in such cases.

  • Most notable for the agents to negotiate with other agents is that they require a frame of options to negotiate with. Some agents show a binary behaviour, i.e. if not all input requirements are fulfilled they cannot provide any service. On the other hand, for some agents, the quality of the provided service is debatable due to multiple internal equilibrium states that can be reached without causing additional risks. In addition, an entity is also allowed to increase the performance or quality by increasing higher input values, or loosely speaking, an agent provides better services for a higher price. “Quality” in this context may be defined as combination of attributes like duration of a service, number of people affected. The more options agents have for negotiations the higher the degree of freedom and the higher the dynamics and the flexibility they have to determine better solutions.

On a more general approach, an agent proactively aims for a specific task to fulfil, which is to keep itself alive and acting in the given environment. As a result it aims to achieve a desired internal equilibrium between the offered resources and the realized production respectively quality of service. It does so by tracking its internal state, by measuring changes of the environment, by communicating and interacting with other agents, and by more or less intelligently adapting its behaviour based on its state and environment. To provide an agent with this capability, a model of the internal state is required. Such a model is discussed in the next section.

3.4 Modelling the Internal State of a Critical Infrastructure Entities

The goal of the internal state modelling of CI entities is to provide agents with the capability to simulate and track their performances under the condition of insufficient supply respectively input services. A CI entity relies on a sufficient supply. In case of limited resources, the entity is usually not able to fulfil its contract respectively provide the negotiated quality of service. This may lead to a reduced or missing availability of CI services which can cause further severe impacts to the population.

The modelling of the internal state of a CI entity should enable an agent to autonomously simulate the consequences of a reduced or a lack of supply to evaluate its resulting overall performance. Usually, an entity can upkeep its business only for a limited time in the case of missing supply. By means of the modelling, it should be possible to calculate the internal equilibrium state of a CI entity in which it can provide a specific amount of service under restricted condition. The results enable agents to draw conclusions about their performance development in the near future which can be communicated to other CI entities respectively other agents and used for distributed decision making.

The consumption and production of a CI varies depending on the CI type, season, weekday, and time of day. It can be assumed that the CI entities from the same type behave similar. This fact allows an admissible simplification of using sub-models for specific CI types. The same underlying sub-model is used for all considered CIs of the same type. Although the CI entities of the same type have similar behaviour, their consumption and production varies with their size, service capacity, or utilization potential. In addition, the underlying core processes in a CI entity have differently implemented coping capacities which allow either shorter or longer continuation of business during a situation of lacking supply. The variations of size and coping capacity are considered by developing scalable sub-models for each CI type.

The use of scalable sub-models ensures a flexible integration of CI entities to a multi-agent-based simulation taking into account the concrete CI system environment of the considered city. This allows fast adjustments corresponding to the city’s CI entities, their types and sizes. The requested information of the CI entities is usually comprised in the CI Cadasters. This reduces the expense for data collection and simplifies the implementation process.

The first maturity stage of the development of a multi-agent-based simulation includes the supply of electricity, drinking water, and medical products for households and for selected CI entities that provide health services in a city. The CI entities that are considered in this stage are drinking water supply, hospitals, dialysis clinics, general practitioner’s (GPs), and pharmacies. Yet this is just a small subset of local CIs that could be considered. Nevertheless, these CI types allow a first test bed for the development and demonstration of the agent-based simulation. The physical interdependencies between the considered CI types are visualized in Fig. 16.1.

Fig. 16.1
figure 1

Visualization of physical interdependencies between selected CIs

Figure 16.1 displays the interconnections of the CIs. This is also used for the determination of input and output goods for each CI entity. Input goods are defined as those services on which a CI entity relies on (consumption perspective). Output goods describe the service offered by a CI entity (production perspective). Input and output goods have a key role in the sub-models whose development will be described in detail in the following.

A sub-model is developed for each CI type. Each sub-model can be understood as a micro-simulation of a CI entity that consists of all key process elements to conduct the core service of the entity. A sub-model usually consists of multiple key process elements which rely on the input goods. In some cases, the key process elements themselves are also interconnected and provide input goods for other key process elements.

To ensure a structured and consistent proceeding, a comprehensive approach is defined for the development of CI type specific sub-models. This is based on following steps:

  • Step one: identifying the internal core services of an entity and defining the corresponding key process elements,

  • Step two: identifying the input goods for each key process elements,

  • Step three: mathematical determination of general key process element’s functionality by at least a daytime specific consumption function, a criticality function, and a continuity function for each of the input goods,

  • Step four: aggregating a key process element’s performance,

  • Step five: aggregating the overall CI entity performance,

  • Step six: checking plausibility.

The approach is generic and has to be adapted to the individual structure and the functionality of a CI type. Step one to three are conducted accompanied by the literature reviews, extensive data collection, and interviews with CI providers, operators, and experts. As a first result, there is a visualization for each CI type specific sub-model displaying the considered key process elements and their interconnections. An example of such a visualization for the CI type of dialysis clinics is displayed in Fig. 16.2.

Fig. 16.2
figure 2

Visualization of considered key process elements of a dialysis clinic (such as Reverse Osmosis System, Dialysis Machine, Acid Concentrate Production, Office) and their interconnections in the corresponding sub-model

Figure 16.2 shows the identified key process elements and their interdependencies for the sub-model of a dialysis clinic as a result of the steps one and two. The determining of mathematical functions in step three aims at the function of the key process elements and their modelling. In this step, at least a day-time specific consumption function, a criticality function of input goods, and a continuity function are determined for each of the input goods of a key process element. The day-time specific consumption function represents the variations depending on season, daytime, and type of day. It takes into account different capacity or potentials of a CI entity, hereby enabling scalability. Although the function represents the consumption of input goods under normal conditions, it is also possible to calculate the consumption for an emergency-triggered increase or decrease of demand. This also includes capabilities to reschedule, extend, or delay some processes if possible. The criticality function describes the consequences on the performance of a key process element if the input goods are missing. It determines the behaviour of the key process element during a lack of input goods. Furthermore, some key process elements are able to keep up a continuous business even in the situation of missing supply. This is caused due to the use of coping capacity and reflected in the determination of a continuity function. The continuity function is scalable according to the amount of implemented coping capabilities in the key process elements. In step four, the functions are aggregated to calculate the performance of the key process elements. In step six, this allows a simulation of the consequences for the whole CI entity and a forecast of the internal state for the time during an event of missing CI services.

In case of a concrete dialysis clinic, the sub-model can be applied according to the size of the clinic and the coping capacity of its key process elements.

Models established this way can be tested by a fictional emergency condition that simulates changes of the supply for a couple of hours (see Fig. 16.3). The simulation results of this example show the development of the quality of service during a day and hence the internal state of the whole CI entity for the considered outage scenario. The varying supply of the input goods electricity and drinking water has severe impact on the performance of the clinic. The adjusted sub-model enables to forecast the internal state of clinic taking into account the reduced amount of input goods. The comparison between the normal and emergency condition demonstrates that the dialysis clinic can only ensure a continuous business for a limited amount of patients. A permanent supply is no more possible under the emergency conditions.

Fig. 16.3
figure 3

The results of fictive simulation runs calculating performances of a dialysis clinic under normal and emergency conditions in which the supply of electricity and drinking water is reduced

This finding provides detailed insights about the still available CI services of the analysed entity. The consequences of an outage scenario can then be analszed by taking into account the performance results of all CI entities in the considered area. The results of different scenarios assist the decision on how to distribute available resources (like electricity, emergency power units, or fuels). Ideally, a smart distribution achieves the highest possible beneficial equilibrium in a city, and, at best, outage of CI services is prevented.

In practice, the CI providers and the disaster management authorities have to negotiate the distribution. In future, this negotiation may be automatically realized by the use of smart metres and their communicative connection in a smart (grid) environment. However, there is still no knowledge available about how the negotiation process has to be implemented to reach the highest beneficial outcome. It can be assumed that different negotiation approaches lead to different results. Hence, the selection of a specific method is significantly levering the resilience of a region. To shed light on this issue, we are aiming at simulating and comparing different agent negotiation methods in our upcoming research.

In the next sections, we discuss the aspects to be considered when negotiating the determination of required resources to sustain a certain performance and the sensible distribution of limited resources to achieve the highest possible beneficial equilibrium in a city.

3.5 Determining Resources to Sustain a Certain Performance

During a power blackout or a training exercise, the question is of interest what resources are required to provide a specifically given performance, especially a minimum supply of services, by the CIs of a city. In general, this question is not triggered by changes in the environment but is explicitly requested by the crisis management group, and thus the agents have to be explicitly instructed to solve this request.

As presented in the previous section, all agents define state transfer functions to predict their internal state depending on their environment, mainly the available supply of power and water. The internal state is then simplified to a single performance value. However, the request to determine the input parameters that are required to achieve a specified performance value requires the inverse of the state functions. In general, these functions cannot easily be inverted, and sometimes it may not even be possible. Nevertheless, a straightforward method to estimate the parameters to achieve a certain performance is to iteratively vary the input parameters until the estimated performance is satisfyingly close to the desired one. This brute force approach can become computing time intensive depending on the parameter space and state transfer functions, yet it is simple and yields to the desired results for a single, local agent. The overall performance of a group of agents can be defined as sum respectively aggregation of the single performances of the agents.

To determine the requirements for all agents of a city, each single agent has to be questioned with a specific local goal. While this is possible, it requires human coordination by splitting the global request to local ones. Also on a global scope better solutions requiring less resources to achieve the same performance may exist, especially if different qualities of service are available and the agents are able to provide them according to the input they are granted. To voluntarily provide a service with lower quality for less resources assumes an altruistic behaviour of the agents where agents may relinquish resources for the good of the society. To benefit from this behaviour, it is therefore preferable to define a city-wide performance like a performance value for dialysis clinic (or in absolute values an amount of patients to be treated) and let the agents determine the required resources for the global request.

Striving for the globally optimal solution of this request is not trivial because of the complexity of the combinatorial possibilities. It is still an ongoing research how to improve the approach yet confine the computation time to an acceptable limit.

3.6 Sensible Distribution of Limited Resources

In case of a power supply disruption, there may be still some resources left to be distributed between the agents in a way to maximize the benefit for the city. Such an optimisation problem where packages (service for resource) are packed to maximize the outcome are generally known as knapsack problems as described in, e.g. (Zäpfel et al. 2010). Sophisticated algorithms like (Polyakovsky and M’Hallah 2007) are known to address these combinatory problems in multi-behaviour agent environments. In the following section, we want to address the potential framework of input parameters that can be considered for sensible distribution respectively negotiation. Several stages of complexity can be distinguished:

  • The least complex request is the assumption of equal types of agents with no flexibility. They require certain resources as input and provide their service in return. The quality of service of such agents, respectively their performance, is either 1 (available) or 0 (not available). The distribution request can be straightforwardly solved by algorithms like the knapsack method.

  • A more complex request is the assumption of equal types of agents with flexibility in quality of service. They achieve different performances depending on the provided resources. In case the performance of the agents is discretely depending on a small number of input possibilities, the distribution request may still be addressed by knapsack algorithms and by varying the discrete input possibilities. Non-discrete performance dependency of resources can be addressed by introducing discrete classes. However depending on the variation of input possibilities, the computing complexity may increase quickly.

  • The most complex request is the assumption of different types of agents with flexibility in quality of service. The different types of agents follow some global conditional restrictions like hospitals are more important respectively critical than households therefore they have to be supplied first [for more insights about the criticality of CI types see (Münzberg et al. 2017)]. Basically this request adds another degree of freedom to the problem increasing the complexity an additional time. Determining the optimal distribution for a city in a reasonable amount of time is currently impossible and one has to settle for a “satisfying” distribution in an acceptable amount of time, e.g. by limiting the computation time and using the best solution so far.

In case of a CI service disruption, the agents will sense it by the sensors in their environment and autonomously begin to investigate coping methods and the distribution of remaining resources. In the following, we present some considerations how agents could react in that case.

  • Initially, the agents should organize themselves in groups with a distinguished leader according to their types as this will help to coordinate and to communicate with external partners like the human beings from the crisis management group.

  • The group leaders then will investigate the available resources. Such resources can be provided externally by, e.g. emergency power generators or from within the city by, e.g. solar panels of local households.

  • Next, the leaders agree on different tasks in descending order of importance to perform. The highest mandatory tasks are directly endangered human lives. The secondary tasks may be highly valuable facilities or institutions. The tertiary tasks may be the prevention of minor injuries, and so on. This requires global knowledge (which may already be implemented in the agents behaviour) and has to be fixed by the accident management group beforehand. The agents will try to address the tasks in the order of importance.

  • For each task, the agents know if they have to contribute to fulfil it. As an example, a hospital may have patients in need of intensive care. As consequence, the hospital will join in the negotiations of resources in the first round as human lives have highest priority.

  • The distribution of resources itself can be determined in many different ways. Besides straightforward approaches like the knapsack algorithm agents provide other methods of coordination and interaction like negotiation and bargaining, auctions, goal programming, multi-objective optimizations, arguments, game theory, and many more (Weiss 2000). As an example of negotiation, the hospitals of the city may offer their service in turn for resources. To do so they need a protocol, i.e. a common understanding of expressions and statements used in the negotiation process. Expressed in human language a hospital may offer in terms like “I can provide intensive care for 4 people for 10 kWh, or for 10 people for 20 kWh”. Another option could address the targeted achievement of a desired set of objectives like it is given by protection target levels or equilibrium states such as the minimum level of supply (“Mindestversorgung”) or other safe states in which a sufficient supply of basic services is ensured. In all cases, it is up to the leader to organize the negotiations. Many options and parameters are possible to be taken into account, like performance indicators, vulnerability values, stakeholder preferences, time dependencies, absolute and relative attribute values, location, available coping capabilities, to only name a few.

  • As the agent behaviour and the negotiation process is organized in time steps, one has to be aware of looping logic problems. Contradictions may occur between different time steps due to the negotiation process in one time step is based on forecast of the internal state. This state may be changed according to the negotiation made in former time steps. The problem can be addressed by using longer intervals for the negotiations such as concrete block bids that consider a supply of a CI service in a specific amount for a specific time period of, e.g. multiple hours. In this way, the agent may periodically negotiate at different points in time during the whole outage time.

  • Finally, the leaders distribute the fixed agreements to their group members and external partners like the crisis management group. In case of no contact to external partners, the agents could either perform the measures autonomously or do nothing, depending on the pre-adjustment of the crisis management group.

The presented process is highly complex and a field of ongoing research as we are constantly refining and improving the suggested procedures.

3.7 Advantages of Multi-agent-based Approach

In comparison to other decision support approaches like optimization algorithms carried out by a single instance, the distributed agent-based approach provides inherent advantages like most notably scalability, extensibility, focus on local resilience, up-to-dateness, and transparency.

Scalability: Agent-based modelling provides a great flexibility in scaling as agents are in general service providers. For example, to model the power consumption of households, a coarse agent model aggregating each city district may be initially sufficient. Such an agent can simulate the consumption depending on the number of inhabitants in a district and a characteristic function. However, at some point, a more accurate simulation of household consumption per street may be required. In this case, the district agent will not use its characteristic function anymore, but will delegate the request to sub-service providers, e.g. newly introduced street agents in his district. Thus, a more accurate simulation of household consumption is possible, yet the general structure of the overall model has not to be changed.

Extensibility: At some point in time new types of CIs or more general, additional types of consumers and producers of resources are added to the city model. Centralized decision-making would require a reconfiguration of the whole decision process, a revision of the information flow between CI and decision makers, and further technical actions like re-compiling and deployment. The distributed decision-making shifts the “knowledge of how to decide” into the software agents. This of course requires rather intelligent agents that are able to organize themselves, to acquire necessary data, and to appropriately react on the changing states of a city. As a benefit of this approach, new light-weight agent types like solar panel agents as power producer can simply be added to the network as they contribute to the decision process in a generic way by their very design. Nevertheless, the introduction of new heavy-weight CIs that interact and affect all existing agents in a specific way like some infrastructure comparable to the water supply will require a modification of all concerned agents.

Local resilience: With increasing smartness of the local components (smart grids) more and more small, local providers of resources are available like households feeding solar panel power into the power grid. While in default operation mode, these small providers may be mainly helpful to contribute green decentralized generated power to the network, in case of a serious network disruption they can significantly decrease the vulnerability in their neighbourhood. Assuming the producers and consumers of this neighbourhood are smart agents, they are capable to detect the cut-off from their main supply. They will start to distribute the locally available resources as an isolated (island) operation without the need for a centralized management, thus fast reacting to the situation and forming resilient islands.

Up-to-dateness: A very important problem in a decision situation is the correctness or up-to-dateness of the underlying data like the number of actual patients in a hospital, currently closed streets, or local maintenance downtimes. Depending on the type of CI, the relevant data describing the state of the structure can be more or less outdated at the time when the decision is to be made. This can be caused by lazy updates of the state, especially if updates are done manually by human beings, or by delays in the transport from the structure to the deciders. In both cases, smart agents can reduce this inherent problem. On the one hand, agents can measure their environment state and, depending on the available sensors, are therefore always up-to-date. On the other hand, in case of a distributed decision-making, there will be no delays caused by requesting and transferring the necessary data to a centralized location.

Transparency: As smart grids are introduced, the concept of more or less autonomous and intelligent entities maps directly to an agent-based software approach. Therefore, the modelling and comprehension of agents is much more straightforward and comes naturally to human beings in contrast to a centralized model even if the applied methods are basically the same. Such transparency can be very helpful to better understand the processes of an urban area

4 Summary

The introduction of smart grids into urban areas opens up a wide amount of possibilities to better understand critical infrastructure processes of a city and as consequence provides insights to improve the resilience of a city against disruptions of basic service supply. The realization of the entities of a smart grid as software agents in a multi-behaviour agent system not only allows to simulate and analyse supply disruptions today but also motivates a direct embedding of software agents in real devices in the near future.

The presented approach was preliminary implemented using the Repast Simphony framework (Argonne National Laboratory 2015) for evaluation of the concept. This framework is a Java-based, cross-platform development environment for agent modelling. The models for this CIs were derived by the above noted procedures and implemented in Repast Simphony as Java classes with Java annotations (Collier and North 2016). Annotations are compiler meta-directives, which in case of Repast Simphony are used to mark the interface points between main frame and model implementation. Additional information on agents is derived by implementing agents as JavaBeans and by the framework using Java introspection techniques, thus simplifying the implementation of agents. Figure 16.4 shows the visualization of some CI entity agents (water supply, hospitals, pharmacies, and dialysis clinic) of the city of Karlsruhe. To achieve this, the implemented CI types were instantiated with real-life CI entities using parts of the original information of the CI Cadaster of Karlsruhe and publicly available information from e.g. OpenStreetMaps.

Fig. 16.4
figure 4

An agent-based realization of some critical infrastructures of the city of Karlsruhe. The different icons indicate water supply, hospitals, pharmacies, and dialysis clinic. The green and blue bars indicate the state of the power respectively water supply of the according structure for the considered point in time of the simulation

On the one hand, these agents can autonomously determine appropriate measures to cope with supply disruptions if they are cut from their supervisors. In the near future also, other CI types will be modelled and implemented to consider as many relevant local CI services in an urban area as possible. On the other hand, their localized and comprehensive analysis is a valuable contribution to the decision support process of the human deciders of a crisis management group. The findings can be used in the disaster preparation phase to provide valuable information of how to implement an agent-linked smart grid transition process and to enhance the resilience by using simulations to identify capability weaknesses and strengths of a city’s CI services.

Taking into account future developments of city systems over decades, agent-based simulation will potentially enable to identify pathways for enhancing urban resilience. We therefore shed light on the basic modelling approach, and in particular on the strategical negotiation of agents, to prevent undesired states of vital service provisions. It is an essential foundation of our research activities on simulation-based decision support for Critical Infrastructure Protection (for more details, see also Ottenburger and Münzberg 2017; Raskob et al. 2015a, b).