1 Introduction

The EU policy is set on reducing the vulnerabilities of critical infrastructure (CI) through enhanced resilience. According to the definition given by the European Commission (EC) (European Commission 2018), Critical Infrastructure is “(…) an asset or system which is essential for the maintenance of vital societal functions. The damage to a critical infrastructure, its destruction or disruption by natural disasters, terrorism, criminal activity or malicious behaviour, may have significant negative impacts for the security of the EU and the well-being of its citizens”. This definition shows the importance that the EC puts on Critical Infrastructure (CI) in terms of the safety and security of EU citizens. The protection, undisrupted function and risk prevention in CI are , thus, set as a priority. This is depicted by the vivid legislative activity within the last decade, in order to specify the norms for the effective management of EU CI safety and security, with particular emphasis on the EU Directive on European Critical Infrastructures (European Council 2008a, b) and its thorough review that was undertaken in 2012 (European Commission 2012). This culminated in the adoption of 2013 Staff Working Document on a new approach to the European Programme for Critical Infrastructure Protection (European Commission 2013). In the latter, CI protection and resilience are based on three main axes—prevention, preparedness and response—within a common approach for the implementation activities throughout the EU, which shall better consider interdependencies.

Thus, demands for CI protection have been significantly heightened, but more importantly, limiting the consequences of disruptions to humans and the society is also increasingly emphasised, which means that progressively, prevalence is given to the continuity of the social function or service, and not necessarily to the integrity of the infrastructure itself, as remarked in the Commission Implementing Decisions/2012 document.Footnote 1 This evolution trend would certainly not be feasible without an overarching and comprehensive European legal framework, particularly when infrastructure managers and their various stakeholders are faced with growing operational pressures emanating from factors such as the scarcity, uncertainty and variability associated to the access and allocation of resources.

The protection of CI has become a rather complex issue, mainly due to a long list of threats, vulnerabilities, and consequences that have evolved over the last decade, e.g. the combination of physical with cyber risks. This is the result of the increasing invasion of information and communications technologies in critical infrastructure operations and the inevitable openness and interdependence that they generate.

Knowledge about risks is currently quite extensive. As the OECD document on Resilience Systems Analysis (OECD 2014) suggests, there are numerous risk analysis tools for the identification of possible problematic areas and the possible impacts of different kinds of disruption (economic, weather, etc.), while indicating the lack of concepts and techniques for increasing resilience at different levels (personal, societal, etc.), which would consecutively lead to relevant investment.

The importance of Critical Infrastructure Resilience management is highlighted also by the fact that it is not only an EU but also a global priority. In many countries around the world, like the USA, Australia, New Zealand, relevant initiatives are ongoing for setting out the framework for the protection and enhancement of the resilience level of their National CIs. Taking as an example the USA, the five National Priority Areas for NCIS R&D have been defined in the relevant plan issued in 2015 (NCIS R&D 2015), including the description of CI systems and their dynamics, the development of relevant risk assessment and management routines, as well as methods to ensure resilience in CI, the incorporation of data science in the above procedures, and the cooperation within and between interconnected systems.

As previously outlined, extensive knowledge is currently available and important applications of resilience have been achieved on many different fronts. Despite these achievements, in the aftermath of recent serious natural and man-made events, the operationalisation of resilience seems to remain short of expectations. The underspecified understanding of context-specific conditions hinders the ability to effectively adapt resilience solutions to real local dynamics. This also challenges the ability to accurately assess the effectiveness of solutions in view of specific scenarios and social contexts, as it requires the integration of multiple factors about which, knowledge often remains insufficient.

Project RESOLUTEFootnote 2 aimed to address the diversity of threats and vulnerabilities, with particular focus on the UTS, taking into account the challenges faced for the effectiveness and operationalisation of the actions and measures being implemented. The focus on UTS was based on the increasingly critical role of mobility on every social and economic activity, both within communities and amongst multiple different communities. This paper starts by outlining the challenges faced towards enhanced resilience in the context of urban transport systems and presents the achievements of RESOLUTE in terms of effective implementation of resilience-oriented measures, mainly through the development of generic European Resilience Management Guidelines (ERMG) applicable to any Critical Infrastructure. Then, the ERMG has been adapted within the framework of Urban Transport Systems (UTS), specifying and “translating” the suggested recommendations in terms of the characteristics, the needs and criticalities of UTS, testing them in real-life environments in the two RESOLUTE test sites (City of Florence and Athens Subway) and finally assessing the test results for optimising them.

2 The concept of resilience

The concept of resilience covers many different matters (Westrum 2006) and is used across many different scientific fields such as engineering, biology, psychiatry, cyber security, computer networks and so forth (Jackson 2010; Holling et al. 2002; Boin et al. 2010; Vugrin et al. 2010; Holling 2010; Bellini et al. 2019a, b; Bagnoli et al. 2018). Within the field of resilience engineering, this notion is defined as “the intrinsic ability of a system to adjust its functioning prior to, during or following changes and disturbances, so that it can sustain required operations under both expected and unexpected conditions” (Hollnagel 2011a, b). Because it is based on the adjustment prior, during or after events, this concept must encompass a certain timescale, which underlines its dynamic nature. Thus, resilience is a process through which successful performance is continually pursued, rather than a stable outcome obtained through a static characteristic of the system (e.g. topology). The key challenge in the pursuit of resilience is coping with the current high complexity, continuous changing conditions, and the inherently underspecified operations of sociotechnical systems (Wilson 2009). This leads to considering uncertainty intrinsically as part of the system. Grote (2004) proposes two different approaches for the management of uncertainty in organisations: minimising or coping with it. Under well-known operation scenarios and the stability conditions, safety management-maintained reliance on the ability to prevent known threats and efforts were focused on thorough planning and monitoring of operations, as a way to minimise uncertainty. Rules and procedures aimed at minimising degrees of freedom and any deviations from planned and prescribed processes were seen as the need for further planning and monitoring as the reinforcement of rules. On the other hand, as pointed out by (Grote 2009), there is a growing awareness that over-specification of procedures and incremental development of rules will never be able to account for all the possible conditions that might emerge during operations because of their underspecified nature in complex systems. Thus, it is necessary to cope with uncertainty. This means that decision makers require a tool/method capable to assess the adaptive capacity of a system in a certain instant to allow the necessary adjustments on unwanted variability instead of measuring the adherence to standards and rules to reduce deviations (Bellini et al. 2019a, b).

There are several methods proposed in the literature to assess resilience. Such methods can be classified according to their outcome [e.g. scorecard, index, models and toolkit (Cutter 2015; Ayyoob 2016)] or complexity considering cost, time and data for its implementation as parameters (Linkov et al. 2018). In particular, three tiers are identified in the work by Linkov et al. (2018): Tier 1 represents the screening level to identify and prioritise critical components, capacities or functions of the system and that comes at minimal cost or debate. In this tier, adopted by some authors (Williams et al. 2014; Sands 2015), the expert judgement, rather than the empirical evidence, is preferred to choose the characteristics defining the system resilience. Hence, the ability to cope with uncertainty is simply defined as the sum of the scores of all the different elements (indicators) present in the system. Even if the approach is cost-effective and fast-forward, it is not suitable for an in-depth analysis to identify criticism or sustain decision-making in a dynamic condition. Tier 2 introduces the definition of the components/functions and their interdependencies in the system. The models developed should increase fidelity in terms of system representation and undertesting. At this stage, it is possible to define metrics and indicators at a finer-grain level. To this tier belongs also the solution based on indirect system resilience quantification that combines modelling and observations calculation. (Aven 2011). Tier 3 includes the highest fidelity in modelling a real-world system and aims at observing or simulating the conditions under which the functional performance of a system drops (Bellini et al. 2019a, b). Here, the assessment of system resilience passes through the use of Big Data and/or the quantification of system functionality considered as critical system performance components across the temporal stages (Aven 2011; Como et al. 2013; Baroud et al. 2014; Henry and Ramirez-Marquez 2012; Jovanović et al. 2018). Even if the use of Big Data is considered promising, the current approaches are strongly event dependent so that only few information can be obtained related to the potential capacity of the system to cope with unexpected conditions. This makes the methods good for risk management and a bit less for resilience management in complex sociotechnical systems.

To this end, in the present work, the Q-FRAM has been adopted , a method developed in the context of RESOLUTE project, and has been introduced in (Bellini et al. 2017) and formalised in (Bellini et al. 2019a, b) to assess the potential for resilience of complex systems. The method exploits some technologies of tier 3 like a data-driven approach, but implements a fast-forward methodology for resilience assessment typical of Tier 2 approaches. The result is a data-driven but cost-effective quantification of the capacity of the system to cope with unwanted performance variability under unexpected/emerging changing conditions in UTS.

3 Urban transport system resilience

Transport has been considered as one of the first priority sectors for action in the European Programme for Critical Infrastructure protection (EPCIP). UTS today must cope with significant vulnerabilities: ageing infrastructure, continuously increasing population of urban areas, interdependencies among physical and cyber infrastructure, neighbouring of transportation systems with hazardous production facilities (also considered as CI), along with the threats of climate change and terrorism. All these have created significant challenges for the UTS as a critical infrastructure system.

The concept of resilience, however, has been broadly used to characterise a system that recovers rapidly from a disruption to resume normal operation. Moreover, resilience not only involves recovery. UTS resilience is an overall concept, defining a complex transportation system that can better withstand disruptions. The UTS includes cyber, physical, social, and organisational elements that are all critical to build resilience (Volpe 2013).

A resilience framework should not only aim in retaining the operating status and returning the system to a pre-disaster condition. It should be addressed as the framework within which it would become possible (by making the most out of existing capabilities and resources) to create or convert a transportation system into being more effective and much less vulnerable to disruption according to the “bounce forward” concept (Coaffee and Clarke 2017). Thus, a UTS can be considered resilient if: (a) it is capable to withstand critical events, (b) it is capable to adapt to changing conditions and responding appropriately to threats, (c) it is capable to mitigate the consequences speeding up operation recovery. These three attributes—robustness, adaptiveness and consequence mitigation—form the foundations of a resilient system according to (Volpe 2013). However, in our domain, the 4th aspect as learning and adaptation phase should not be neglected. Thus, a system has to learn from the past and continuously develop its adaptive capacity and enhance its coping ability accordingly.

From all the above, it is made clear that there is a gap in providing the Critical Infrastructure owners/managers with the necessary guidance that would allow them to organise and strengthen their facilities, personnel and any other kind of assets in an effective and standardised manner, in order to confront the continuously raising needs for resilience against any kind of risk.

This is the gap that RESOLUTE (Gaitanidou et al. 2018) strives to fill in, by suggesting guidelines for resilience management (European Resilience Management Guidelines—ERMG), focusing on the actual functions necessary for the effective operation of a critical infrastructure and given in a generic manner, so as to be applicable to and adaptable by any kind of critical infrastructure.

4 System complexity

The UTS is a relevant case of an interconnected system where critical infrastructure and multi decision makers (people, civil protection, public administration, etc.) are involved. The numerous links and interdependencies of UTS make it a major criticality in terms of critical infrastructure resilience. In the case of an emergency, regardless of the system that it may address, the transport system is always affected, either by the emergency itself, or by the actions and measures to prevent or confront it. Thus, when considering the abovementioned interdependencies, along with the complexity of the system as such and its high exposure to a variety of threats (from system malfunctions to climate disasters and terrorist attacks, see Sect. 3.2), managing resilience in the UTS becomes a multi-parametric task of primal criticality. There are various factors that define and, thus, underline this complexity:

  • Large scale networks Especially when referring to big cities or metropolitan areas, UTS is composed of networks of significant extent and involving different modes (road, rail, etc.). The management of each type of network has many differences and the bigger the scale the more complex and difficult it is to effectively assure its proper operation. Moreover, impacts of migration issues between network components, in case of a disruption in one of them, are making this task even more challenging.

  • Multiple modes UTS potentially comprises most of the available transportation modes, including road (private and public transport, cars and two wheelers), rail (surface and underground), pedestrians and waterways. This characteristic alone significantly raises the system’s complexity, as the needs of each mode (and of each of its components) are different, both in normal operation and during an emergency.

  • Multiple industries A direct consequence of the variety of modes and actors in UTS is the involvement of a great range of industries. Vehicle and building are the most evident ones, together with electronics, cyber technologies, fuels, commerce, etc. The list is long and the influence and the impact of UTS to each of them (and vice versa) is significant and multi-parametrical.

  • Multiple operators—scattered responsibility This is directly linked to the co-operation of different modes in the same UTS (see above, second point), as the existence of multiple operators in the system (usually a different one per transportation mode, plus private transport) can be an additional factor of complexity in terms of responsibilities’ distribution. This may become more crucial in the case of an emergency, as the management of system actors would require coordination in multiple and different nature levels.

  • Public–private mix The fact that UTS is composed of both public (public transport vehicles, infrastructure, etc.) and private (own vehicles, pedestrian movement) components is another reason that contributes to its complexity, as there are different rules and means of controlling their application in each case, while managing the whole system demands an overall control function.

  • Multiple recipients (people, freight) The main recipients of the services offered by UTS are people and goods. In both cases, the demanded service can be highly differentiated and unpredictable, in terms of its nature, frequency, duration, location, etc. Moreover, a disruption in the system operation (e.g. closure of part of the network) could result in major social and economic consequences (like the inability to reach business or health facilities).

  • Critical to economy The criticality of UTS in the local and, consecutively, the national and international economy is obvious, not only from all the abovementioned factors, but also since the urban environment is the core of the economy and UTS is the primary means for any kind of economic activity to flourish and succeed to its targets, e.g. movement of goods, transportation of people to work, leisure, shopping centres, etc.

From all the above, it is evident that dealing with UTS means dealing with a highly complex, multi-actor and multi-parametric system, whose management—especially in terms of resilience—is a critical and challenging task. This complexity is a drawback towards achieving absolute control of the system and, in fact, this is not the aim of resilience management. However, it requires the establishment of concrete structures and strong synergies, in terms of envisaging a sustainable and resilient operation.

5 Threats for UTS

Transportation systems are prone to suffering from various kinds and levels of disruption, which often result in lowering the level of provided service and putting at risk the security of the infrastructure and equipment, as well as of the people that use them. Such disruptions may appear in different forms, such as accidents (due to either misuse of infrastructure/equipment or malfunction of the infrastructure/equipment itself), weather-related threats (especially extreme weather phenomena), physical or cyber attacks (including terrorism) and all possible combinations of the above. Transportation security involves the detection, identification, mitigation and protection against both physical and cyber threats (considering any external event that may lead to the disruption of the normal system operation), aiming to the users or the infrastructure. In particular:

  • Cyber security threats emanate from the emergence of I2I (Infrastructure to Infrastructure) communications, either within system communications, i.e. between, assets of the same transportation system, or between systems communications, i.e. between different transportation systems and their components.

  • Physical security threats in UTS include (among the others) terrorist attacks, natural disasters, crimes, transportation of dangerous goods (e.g. inflammable materials), etc.

Due to their complexity, transportations systems’ exposure to threats is characterised by high uncertainty and unpredictability. Following, some prominent threat types are discussed:

  1. (a)

    Natural hazards-induced threats Maybe the most prominent among physical threats, and the one with major consequences in transport infrastructure, is the effect of climate change, leading to extreme weather events. Significant changes have already occurred in the climate system, and many among them are already irreversible, with inevitable consequences. This situation is expected to significantly affect (among others) also transportation systems, especially regarding their infrastructure components. There are many severe weather events which impact road transportation: heat waves, cold waves, heavy precipitation, fog, snowfall, large-scale storms, wind thunder storms and blizzards. Furthermore, other physical events, like earthquakes, also constitute significant threats to transportation systems. (Vajda et al. 2011). Finally, the increase in extreme climatic events, among the others, causes an acceleration in erosion and ageing of the infrastructures, whose dynamics might not have been properly considered at the time of design (see the impressive collapse of the Marandi’s Bridge in Italy-2018).

  2. (b)

    Cyber threats Today’s transportation networks highly depend on information systems and even more complex networks that enlarge the attack surface making it difficult to know what and how the assets are exposed; therefore, cyber attacks have become a scary reality. The ransomware attack to the Colorado Department of Transport in February 2018 (CODOT) that disrupted operations for several weeks is an example among several. In today’s trend of creating “Smart” cities, of which transportation systems are considered as a cornerstone, digital infrastructure is dominant and interconnections between different services and actors may become target of cyber attacks. The core of information and communication systems is data; such data may contain information on tracking the location, status, and condition of physical assets and associated infrastructure, and thus provide the capability of control of different assets. This implies a series of minor or major threats to the safety and security of transportation systems upon a cyber attack. The main key risks deriving from cyber attacks can be summarised in the following:

    • Physical asset damage and the associated loss of use (e.g. traffic lights and electronic traffic signals);

    • Unavailability of IT systems and networks (e.g. interruption of ticketing services, traffic management systems);

    • Loss or deletion of data including data corruption or loss of data integrity;

    • Data breach leading to the compromise of third-party confidential information, including personal data;

    • Cyber espionage resulting in the compromise of trade secrets and other sensitive information;

    • Extortion demands to cease a cyber attack along with direct financial loss and damage of reputation.

  3. (c)

    Terrorist attacks Transportation systems remain key targets for terrorist and extremist groups due to the potential for both mass casualties and the economic, psychological and political disruption. Terrorist groups are generally displaying patience and determination in their operational planning, with attempted attacks around the world over the past five years illustrating continued efforts to defeat or bypass transportation security regimes. Buses, trains, airports in London 2005, Madrid 2004, Paris, Brussels 2016 and many other cities have been targets of terrorist attacks in recent years. Acts of terrorism relate to the transportation systems mainly in three ways:

    • When transportation is the means through which a terrorist attack is executed;

    • When transportation is the end, or target, of a terrorist attack; or

    • When the crowds that many transportation modes generate are the focus of a terrorist attack.

The fact that makes terrorist attacks of more importance is that they may be either physical or cyber and—most notably—they can be both at the same time. This combination implies multiple consequences and should be considered with special care. Such an example may be a cyber attack in a fleet management centre, involving transportation of dangerous goods (e.g. fuel), leading to an explosion of the vehicle(s) with numerous consequences of a physical attack on the transportation infrastructure as well as to human lives and properties.

6 ERMG for UTS: methodology

6.1 The RESOLUTE approach

The project RESOLUTE (RESOLUTE project 2018), co-funded by Horizon 2020 (DRS-07-2014), was a Research and Innovation Action that lasted for 3 years (2015–2018). In the last years, the EU has funded several projects with the focus of developing ERMG and each of them is based on a theoretical background (Save et al. 2018).

Within the framework of RESOLUTE, resilience has been considered as a useful management paradigm, in which adaptive capacities are paramount. RESOLUTE has focused on achieving higher sustainability of operations in European UTS, by recognising the evolving transformation of safety and security requirements in modern cities, the increasing and differentiated needs for mobility, as well as the need for an optimal allocation of resources for achieving sustainable and efficient operations. To this end, in RESOLUTE, the resilience concept has been ideally decomposed into two interrelated sub-aspects: Adaptive Capacity and Coping Ability, where the former refers to making available and allocating resources and assets needed to secure the system adaptation potentiality, and the latter is referred to the actual coping performance of the system that exploits such resources (see Fig. 1).

Fig. 1
figure 1

Resilience concept

In fact, an observation of resilience should be in direct relation with how a system performs, and how capable it is in sustaining this performance throughout a given period. In this sense, only the ‘potential’ for resilience (Adaptive Capacity) can be actually measured with a certain degree of certainty and not its actuation (Coping Ability). For instance, if a system experiences a failure, it can still exhibit a resilient behaviour in the form of graceful functionality degradation and bare service survival from that failure. On the other hand, if a system continues to experience success, it is not possible to assert that it will keep on doing so. Thus, the approach adopted is focused on sustaining the capacity for a system to adapt in the presence of continuous change. To this end, since Resilience Engineering perspective has been adopted in the RESOLUTE that is grounded on what was described by Hollnagel (Hollnagel 2017), the four resilience potentials has been considered to drive the ERMG definition:

  1. (a)

    Knowing what to do (Respond), i.e. dealing the existing circumstances and responding to disruptions, by adapting operations to actual conditions.

  2. (b)

    Knowing what to look for (Monitor), i.e. addressing foreseeable problematic circumstances, by inspecting the systems and its surroundings and identifying potential imminent hazards.

  3. (c)

    Knowing what to expect (Anticipate), i.e. addressing potential risks, by preparing adequate modification in system operation to confront them

  4. (d)

    Knowing what has happened (Learn), i.e. reporting and learning from past events, acquiring knowledge by both successful and non-successful performance.

Hence, adaptive capacity is related to the level of resources that a system can allocate and its ability to manage these resources in view of specific adaptive cycles execution, described based on the four stages of event management cycle that a system needs to maintain, in order to be resilient (National Academies 2018):

  • Plan/Prepare Ensure availability of services and resources during a disruptive event;

  • Absorb Ensure operation while dealing with the disruption;

  • Recover Return to the level of operation as before the disruption;

  • Adapt Improve resilience by benefitting from the lessons learned for the event, by adequately modifying/improving functions and operations.

In this sense, relevant indicators and monitoring tools should prove that these four cornerstones are adequately incorporated in all stages of operational management.

7 The functional resonance analysis method

The development of the ERMG adopted a system’s perspective. To this end, the Functional Resonance Analysis Method (FRAM) (Hollnagel 2009a, b, 2012; Bellini et al. 2016a, b) was used to model a generic CI and identify which are the desired functions and the related interdependencies that should be implemented for a CI to be resilient (Ferreira and Bellini 2018). The four resilience cornerstones previously outlined were used as a starting point for the development of the FRAM model. Various documentation and consultation with subject matter experts, namely actors from municipal and transport services, were used to achieve the necessary level of detail in the description of CI functions and calibrate the model.

The Functional Resonance Analysis Method (FRAM) (Hollnagel 2004) describes system failures (adverse events) as the outcome of a functional resonance arising from the variability of normal performance.

FRAM is originally ruled by four basic principles:

  • First principle The equivalence of success and failures

  • Second principle The inevitability of approximate adjustments

  • Third principle Consequences are emergent

  • Fourth principle Functional resonance

In its implementation, the method comprises the following five steps.

  1. (a)

    The definition of the purpose of the analysis

  2. (b)

    The identification and description of system functions. In FRAM, a function is an activity with important or necessary consequences (Hollnagel 2004) for the state or properties of another action.

  3. (c)

    The assessment and evaluation of the potential variability for each function. This involves an assessment of a set of Common Conditions (CCs) that have an influence on the function’s performance variability, as described by (Hollnagel 1998).

  4. (d)

    The identification of functional resonance. This involved determining how the variability of one function interacts with other functions and the system as a whole.

  5. (e)

    The identification of effective countermeasures to be introduced in the system. These include measures that would preserve and foster the system’s safety and sustainability.

The definition of functions is one of the most important aspects in FRAM modelling. A FRAM function describes three main aspects:

  1. (1)

    what people have to do in order to achieve a specific objective;

  2. (2)

    what an organisation does: for example, the function of an emergency room is to treat incoming patients;

  3. (3)

    what a technological system does either by an automated function or interacting with humans.

A function may refer to all the three assets at the same time, even if only one should be identified as prevalent (Bellini et al. 2019a, b).

The aim is to describe the normal activities performed by the system in focus. The next step is to assess each function’s safety by using six parameters (Input, Output, Preconditions, Control, Time and Resources) (see Fig. 2) as defined below:

  1. (a)

    Input (I) what initiates the function,

  2. (b)

    Output (O) what is the result of the function,

  3. (c)

    Preconditions (P) required conditions for the function to operate,

  4. (d)

    Resources (R) required feed for the function to produce the output,

  5. (e)

    Time (T) temporal constraints affecting the function,

  6. (f)

    Control (C) monitoring or control procedures for the performance of the function.

The description of each function is performed in terms of a table format. This description actually constitutes the FRAM model. For each function in the model, recommendations were then produced based on resilience and transport literature, as well as empirical evidence gathered through initial project stages. These recommendations aimed to provide guidance on how to dampen performance variability of each function, taking into account different scenarios of operation and aiming to ensure the overall continuity of service delivery under any unexpected condition/event. The objective is to sustain the adaptive capacity of the system in the face of continuously changing operational conditions (flexibility) and the continued and coherent pursuit of goals in their own timescales (rigidity/robustness).

Fig. 2
figure 2

FRAM function representation (Hollnagel 2004)

8 UTS ERMG definition methodology

The ERMG definition is based on a novel vision that goes beyond the “Work-As-Done”—“Work-As-Imagined” dichotomy existing in FRAM-based system analysis. This perspective introduces a new point of view that has been nominated “Work-As-Desired” to build a Reference Model (instead of a representation of the reality) as a baseline against which it is possible to assess different kinds of CI. In particular, the approach is organised in three subsequent steps:

  • 1st step the definition of the desired functions, the basic functions that a CI should comprise (see Table 1)

    Table 1 List of UTS functions per section
  • 2nd step definition of the interdependencies (2nd step) that are considered desirable or recommendable to enhance CI resilience, rather than analysing those already existing in the system.

  • 3rd step definition of the recommendation for damping performance variability in the system.

Indeed, this process requires to clearly define the boundaries of the system as well as the right level of granularity for the definition of the functions.

In the present case, the description at a generic level have been kept in order to provide a better overview of the sector-independent system while maintaining the analysis on functional variability significant. In the 1st step the following have been identified:

  1. (a)

    “core” functions (deliver service and Use of the service) representing the mission of the system;

  2. (b)

    “desired” functions that should be present to consider the system resilient (see Table 1);

  3. (c)

    “background” functions that represent the boundaries of the system.

The same has been done in the 2nd step for the “desired” interdependencies among the functions. Finally, in the 3rd step, the guidelines that compose the ERMG have been defined, stemming from the literature and relevant good practices, in correspondence to each of the functions. Each guideline’s description was structured as follows:

  • Section: anticipate, respond, monitor, learn These are the 4 resilience cornerstones. The functions are grouped under the characteristic to which they mainly contribute.

  • Name of the Function≫ The name of the system function was identified during the FRAM-based system analysis (Bellini et al. 2017) (see Table 1).

  • Background facts: The main rational behind the guidelines, the current issues, and roles associated to the function are reported.

  • General recommendations section includes recommendations related to the function’s “should do” in terms of activities to sustain the system's adaptive capacity.

  • Common Conditions recommendations section provides recommendations about “how to dampen function performance variability” to continue to deliver the desired outcome under unexpected conditions/event. This part represents the real added value of the guidelines.

  • Interdependencies recommendations section describes how the reported recommendations address a function to manage possible input variability generated by upstream functions within the system.

Additionally, the following supportive elements have been included for each function:

  • Abstract a distilled summary of the guidelines is provided to quickly orient the reader.

  • Questions The questions provided aim at supporting the reader in assessing his/her own function. In fact, they are questions that CI managers or decision makers should pose to themselves to verify the level of implementation of the guidelines in their own organisation.

  • Examples Several best practices related to each function are reported to improve understandability of the guidelines

  • Resources Relevant articles, standards, directives, etc. are listed, which have been used to justify the provided recommendations and for further technical and scientific investigations of the reader.

The ERMG adaptation to UTS has required the implementation of a specific method to keep the translation consistent and valuable for the end users. In particular, the adaptation process has been conducted involving experts in UTS. The process has been structured like scientific reviews. A list of candidates has been created through the endorsement mechanism, so that each partner in RESOLUTE identified 3 names according to the following criteria: recognised expertise, reputation and involvement in EU projects and initiatives. The candidates have been contacted and a subset is created on the base of their availability. The final list has been created securing the representativeness of all stakeholders (operators, first responders, industries, public institutions, research organisations, etc.) and avoiding overlaps. The revision and adaptation stared after having trained the experts on ERMG scope and background. Then each expert received by email the document with a simple guideline to provide her own comments and feedbacks on the description of FRAM functions. Once all the feedbacks have been collected, a fast-forward consensus meeting among them has been organised to agree on the contributions before their acceptance. For each feedback analysed, each expert expressed a judgement. All the judgements have been processed through the fuzzy bag approach (Gabriella and Yager 2003). The idea is to interpret the representative majority as a fuzzy set instead of a single value. This fuzzy set includes all the possible subsets representative of a majority within the collection of values expressing all the valuations. This needs that the strength of a majority and the synthesised value expressed by this majority should be defined. Once multiple judgements from experts have been gathered in a bag of valuations, a characteristic function for identifying similar values has been identified (Bellini et al. 2017) and the final evaluation is computed.

9 UTS ERMG operationalisation

9.1 ERMG-driven self-evaluated multilevel gap analysis

The UTS ERMG (Gaitanidou et al. 2016) aims to support a self-evaluated multilevel gap analysis with respect to the state of affairs of CIs considered. The UTS ERMG are structured to support the reader in the assessment as well as the improvement of the CI of interest. In particular, three levels of analysis are identified and supported by the UTS ERMG:

  • Level I The first level of analysis can be carried out by the comparison between the “desired functions” defined in UTS ERMG against the functions identified through a FRAM analysis of the CI under assessment. The absence of one or more functions immediately orients decision makers towards its implementation as applicable. This preliminary assessment can highlight relevant issues in the organisation.

  • Level II The second level of analysis is carried out by the assessment about how the functions implemented in the assessed CI are aligned with the UTS ERMG recommendations. The reader should be able to understand if general as well as common conditions recommendations are applied and at which level of detail. Moreover, indications and insights on how to improve the existing ones to manage the variability of the functions’ output can be retrieved by the document.

  • Level III The third level of analysis is oriented to the function interdependencies assessment. The UTS ERMG provides several desired interdependencies that are able to increase the system resilience. The missing connections between functions in the CI assessed may suggest that information or resources are not properly supplied or shared, creating vulnerability in the system. Moreover, a function that is coupled with another may be prevented from providing the expected outcome if the variability of the upstream function exceeds the capacity of the downstream function to manage it. Thus, in order to manage such functional resonance, the ERMG provides the reader with recommendations about how to manage variability at function level coming from the upstream functions.

The synthesis of the gap analysis is obtained by adopting the RAG (Hollnagel 2011a, b) tool. At the end of the assessment, the reader is more aware about the importance of the resilience thinking in the UTS domain, which is the status of the analysed UTS and what to do at operational, tactical and strategic level to increase the resilience of the system.

10 ERMG-driven implementation

The efficiency and effectiveness of the stakeholders’ decisions in daily operations are affected by the quality (Ceravolo and Bellini 2019; Bellini and Nesi 2013), quantity and timeliness of the information actually retrievable from the system. Thus, operators need to be supported by a rich and powerful information system to reproduce the intended impact (Bellini and Nesi 2018). In particular, the vulnerabilities emerged while working with the RESOLUTE pilot sites (Florence and Athens), stakeholders (mobility dept. civil protection, public transport operators, energy suppliers, etc.) against scenarios as river and flash flooding, vehicle accidents and bomb attack in UTS, highlighted the need of exploiting static as well as dynamic/real-time multi source information to cope with such unpredictable events and support informed decision-making..

The work performed within RESOLUTE project aimed to operationalize resilience by building this informative layer. In particular, the analysis highlighted a need of creating a common informative background across the different operators sharing information as:

  • Geographic base city structure (streets, areas, etc.);

  • Risk maps (seismic, hydrogeological, hydric, storm, etc.);

  • Nature, location and magnitude of event (measured through sensors or inferred through user/citizen feedback);

  • Who is reacting and the estimated recovery time (estimate/time of arrival of units);

  • Passengers/civilians: position, number (in system/train estimate according to peak/off-peak time) and behaviours (movement directions and speed);

  • Meteorological information (forecast from models, intensity from sensors, etc.);

  • News/events (planned and unplanned) updates: where are strikes, marathons, car accidents, etc. happening in the city that affect the UTS;

  • Roles and contact list: general information on responsibilities, jurisdictions, information graphs: chain of command, who is the process/maintenance owner of certain parts of the UTS (e.g. drainage on streets).

The common information layer represents the desired background that all the operators would like to access. However, each operator has its own specific decision to be made in daily life or during an emergency. In particular, in Table 2, several examples of decisions associated to the related operators and to the corresponding information have been identified, which can reduce the knowledge gap.

Table 2 Decisions and information needs

In order to respond to such information needs, RESOLUTE project has implemented a Big Data platform to collect and process data generated by heterogeneous sources in the (smart) city (see Fig. 3). In fact, the opportunity given by smart technologies, is to move from simulated data and ex post analysis approach to a dynamic and proactive ones based on real data generated by daily activities (e.g. cumulating knowledge about critical events, and normal conditions). With respect to the 3-Tier-based resilience assessment categorization (Linkov et al. 2018), these new technologies push the resilience operationalisation capabilities to Tier 3, where they are engaged with a complex modelling of interactions between systems and sub-systems in order to review potential cascading interactional effects, a phenomenon referred to in the literature as panarchy (Holling et al. 2002). While Tier 2 seeks to unveil the structure of a system and its various interconnected parts, Tier 3 reviews mathematically how these interconnected parts (described as critical functions) interact with each other normally and how a disruption in service for one sub-system can generate harmful effects to the other (Häring et al. 2017). Such as Tier 3 resilience management approach can be exploited to shift from a public awareness approach to one of the community–individual safety altering the traditional top-down “command and control” relationships. In fact, in RESOLUTE, the community is seen as an active participant to build the system resilience, rather than a passive recipient of services. Thus, smart technologies can be considered as means to build resilience in UTS (Bellini et al. 2017, 2020; Bellini and Nesi 2018) since they could:

  1. (a)

    enhance the monitoring and control capability, improving the granularity and breadth of knowledge and awareness about the system status and dynamics continuously collecting Big Data from heterogeneous data sources/streams and sensors;

  2. (b)

    enhance the responding capability by providing detailed and timely information to authorities, while delivering personalised, real-time, context-aware, and ubiquitous advice to the community-exploiting technologies that are crucial for augmenting situation awareness and enhancing decision-making;

  3. (c)

    enhance the learning capability applying advanced analysis on Big Data to extract knowledge;

  4. (d)

    enhance the anticipation capability by continuously supporting the assessment of system resilience and identifying them when the system operates nearer to safety boundaries, predicting behaviour and event dynamics, supporting evidence-based decisions at strategic, tactic and operation level, moving ahead respecting the current practices based on pre-simulated emergency scenarios (Woltjer 2006)

Fig. 3
figure 3

Resolute platform

This can be done by starting to collect real heterogeneous data from the smart city leveraging technologies, like environmental sensors data (e.g. river level), whether forecast, Government Open Data (e.g. Risk maps), Social Networks data (e.g. twitter), Real-time Public Transport System Data (taxi movement, parking areas availability, local sensors for car flows, public mobility operator timeline and real-time status), city Wi-Fi access, people GPS position behaviour data (human behaviour) and so on (Bellini et al. 2016a, b; Bellini and Nesi 2018).

Such data needs to be integrated into a scalable and semantic-aware knowledge base to allow further queries and processing. The KM4City platform (Bellini et al. 2014) implemented in Florence is an advanced Big Data semantic aggregator of data generated by the Florence Smart City, able to manage huge amounts of static and dynamic data streams coming from the city and to provide consumption Application Programming Interface (API) for 3rd party exploitation. Such a platform fuels the Collaborative Resilience Assessment and Management Support System (CRAMSS), a Multiple (modality) Input Multiple (modality) Output (MIMO) system able to receive and process in real-time asynchronous and/or pre-computed data to dynamically generate multivariate and context-aware holistic and/or personalised recommendations (e.g. routing, first aid support, role assignment, risk evaluation, etc.) in an adaptive and indirectly interactive manner.

The CRAMSS allows the coordination and information sharing among differed DSSs and it has the capability to enhance them by implementing a mixed approach combining Communication, Data and Knowledge (Bartolozzi et al. 2015) in collecting and analysing in real-time multivariate, asynchronous and/or pre-computed data to support decisions. CRAMSS supports structured decision-making for system adaptive management (Collier and Linkov 2014) at several layers of abstraction. The output information of CRAMSS will then be communicated through situated displays (relevant stakeholders and the wider public), external stimuli (e.g. traffic lights), and/or users’ portable devices providing visual/hearing and cognitive aids. The final front end of the system is represented by the Dashboard which displays the information collected and processed by the KM4City. The information displayed has been identified and selected by the operators according to their decision-making needs (see Table 2). Hence, the daily activities of the operator are now supported by a screen able to provide real-time information that enriches and completes the information obtained through the classical channels (e.g. inspection unit of Civil protection). The scope is to reduce the knowledge gap of each operator by enhancing the information harvesting, sense-making and sharing. In fact, the Dashboard can be accessed from each control room participating in the UTS resilience such as Civil Protection, Mobility Dept. Urban Police, Fire brigades, etc.

Finally, in order to reach the UTS users, a dedicated app has been developed. The Mobile Emergency app aims at informing citizens about the status of an event and provides personalised and context-aware suggestions to support critical decision. The tracking of the actual human behaviour dynamic closes the loop of the entire RESOLUTE adaptive system.

11 Assessment method and pilot testing

The ERMG should be assessed against their capacity of having a positive impact on the UTS resilience. In order to estimate the amount of this impact, a number of Key Performance Indicators (KPIs) have been defined and assessed in two intervals:

  1. 1.

    Before: this assessment considers the value that the indicators had by May 2015, i.e. just before the starting of the RESOLUTE project;

  2. 2.

    After: this assessment considers the value that the indicators have at the beginning of 2018, (i.e. just at the end of the pilot demonstration) and during 2018 (i.e. after the project end); these values in fact take into account rules/activities/decisions that have been established during the project, but will impact the KPIs during the 2018. In this case, the project would have been finalised, but the value assessed considers the changes that, thanks to the project, will be put in charge after the project.

12 Resilience quantification methods

The quantification of resilience is based on Q-FRAM, a method developed in Bellini et al. (2017, 2019a, b). In Q-FRAM, the choice of Key Performance Indicators capable of describing the output and its variability of the associated function is a key point. Moreover, the definition of KPI in Q-FRAM shall comply with the following principles:

  • easy to use/understand;

  • measurement of cost-effectiveness—capacity to obtain the value with expert judgement or computation assuming data availability;

  • pertinence—capacity of representing the variability component of interest of a function

  • consensus driven.

A co-design approach has been adopted involving all the relevant stakeholders such as mobility department, traffic managers, civil protection, urban police, metropolitan city, etc. They contribute in identifying the indicators and adopting to the social dialogue approach experimented in Martelli et al. (2015) and Bellini and Martelli (2012); they reach progressively a consensus on the actual assessment methods (consensus driven) (see Table 3). The choice has been made considering both the main aspects of the UTS system and the guidelines indicated in the ERMG and was driven by the method defined in Bellini et al. (2019a, b). In fact, indicators have been identified for each function in the FRAM model and thus in the ERMG. In Table 3 the fields composing the analysis are reported. In particular:

  • F Name of the FRAM function

  • Function weight it represents a quantification of the dependability of a function within the FRAM model (Bellini et al. 2017) based on the degree prestige (Freeman 1979)

  • Function output defines an output of the FRAM function

  • KPI-ID is the identifier of the KPI associated to the output

  • KPI description describes the object of the evaluation/assessment defined through the consensus-driven approach.

  • Wgt it is the non-normalised weight associated by the expert to each KPI. It represents the contribution of that aspect to the total output variability

  • Value range it defines the range of value that the KPI can be assume during the assessment

Table 3 Example of KPI definition

As the KPIs have different impact on the output variability, a different weight has been associated to each of them. The weight scale is a value in the range of [0,10] and is assigned by the domain experts. So, the operative performance for every function can be expressed as a weighted vector of KPIs.

13 System Resilience Index

In the literature, several examples to compose indicators through aggregation methods exist (Mazziotta and Pareto 2016). According to Bellini et al. (2019a, b), in the present work, a multi-step composition pathway to calculate the System Resilience Index (SRI) aimed as a proxy indicator to estimate the UTS resilience has been adopted. The method starts once all the KPIs have been defined and evaluated in the two instants. Then, the variability occurred in every Link (dependency) in the FRAM model representing the system is estimated. The link is defined as Linki = (Fn:Oy →Fm:Ik) where Fn is the n-th source (upstream) function and the Oy is the y-th output belonging to the function Fn; while Fm is the m-th destination (downstream) function and Ik is the k-th input belonging to the Fm. The variability (Var) in a Link in a certain instant is defined as:

$${\text{Var}}_{{{\text{Link}}_{i} }} = \frac{{{\text{Max}}\left( {{\text{WOP}}_{{F_{n} :O_{y} }} } \right) - {\text{WOP}}_{{F_{n} :O_{y} }} }}{{{\text{FDC}}_{{{\text{Link}}_{i} }} }},$$

where the weighted observed performance (WOP) in a certain instant of each single function is calculated as follows, assuming that the KPIs are compensatoryFootnote 3:

$${\text{WOP}}_{{F_{n} :O_{y} }} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} {\text{KPI}}_{i} w_{i} }}{{\mathop \sum \nolimits_{j = 1}^{n} w_{j} }}$$
(1)

where n is the number of KPIs defined for the function F, w is the weight assigned to the KPI. The additive method proposed to calculate WP suggests that a good score in one KPI may compensate a bad score in another KPI. However, such a compensation is mitigated by the presence of the weight w associated to each KPI. This means that the importance of each factor to the total output variability is different. The Function Dumping Capacity (FDC) representing the capacity of the downstream function to absorb the variability coming from a specific output of the upstream function is defined as:

$${\text{FDC}}_{{{\text{Link}}_{i} }} = \left| {{\text{Max}}\left( {{\text{WOP}}_{{F_{n} :O_{y} }} } \right) - {\text{MAIPT}}_{{F_{m} :I_{k} }} } \right|$$

where the Minimum Acceptance Input Performance Threshold (MAIPT) is defined by Bellini et al. (2019a, b) as a threshold of the input performance below which the receiving function cannot perform as expected causing variability on its outputs. Theoretically, MAIPT should be different for every function and in Bellini et al. (2019a, b), it has been proposed to be calculated using expert judgement. In the complex case of a real UTS system, it might be impractical to estimate an optimum threshold. Thus, for the sake of simplicity, in the calculation, it has been set to 5, that means that every downstream function in the model is able to dampen a variability of 50% of the upstream function, and it is considered a first acceptable simplification of a complex system.

Then, if Var < 1 ,then the variability level can be considered in the safe range and the function will continue to perform as expected; on the contrary, if Var > 1, the deviation exceeds the capacity of the downstream function of damping the inbound variability so that the surplus will be propagated in the UTS.

Once the Var vector is calculated, the values have been aggregated on the base of the system capacity {A, M, R, L} where the source function of the Link belongs.

It is worth to notice that the functions do have the same importance in the FRAM model and the variability occurred in a Link, whose source function is considered critical, will have the worst impact on the system with respect to others that are considered less critical. To model this condition, a specific weight is assigned to each function in the FRAM model: the weight assigning process has been based on network analysis techniques where the Degree Prestige Index (DPI) method has been applied.

Then, the indicator’s aggregation into a Capacity Variability Index (CVI) for each of the 4 system capacities Anticipate (A), Monitor (M), Respond (R) and Learn (L) can be computed as follows, assuming the Var values in the group as not compensatory. This means that the total score should reflect the fact that high variability in a Link should not compensate a low variability in another one. Thus, the weighted geometric mean, that is considered as a trade-off between a non-compensatory and a full compensation method with a lower information loss (Zhou and Ang 2009; Mazziotta and Pareto 2016) is adopted as proposed in Bellini et al. (2019a, b).

$$x{\text{CVI}} = \sqrt[{\left( {\mathop \sum \limits_{i = 1}^{n} w_{i} } \right)}]{{\mathop \prod \limits_{i = 1}^{n} {\text{Var}}_{i}^{{w_{i} }} }},$$
(2)

where \(x=\left\{A:Antcipate, M:Monitor,R:Respond, L:Learn\right\}\), n is the cardinality of the deviation vector Dev, w is the weights vector, and t is the time when the KPIs have been assessed.

Finally, the 4 CVIs as ACVI, MCVI, RCVI and LCVI are aggregated in order to obtain a unique synthetic indicator: the System Variability Index (SVI) indicator able to provide an idea of the global status of the system.

To aggregate the 4 CVI values, the Choquet integral has been used. This type of integral is used to manage preferences on different options or configurations that are independent. For example, according to expert judgements, good levels in RCVI and LCVI are considered less crucial with respect to good level in ACVI and MCVI. Thus, the final SVI index can be computed as:

$${\text{SVI}} = \mathop \sum \limits_{i = 1}^{n} \left[ {\left( {e_{\left( i \right)} - e_{{\left( {i - 1} \right)}} } \right)*\delta \left( {C_{\left( i \right)} } \right)} \right]$$
(3)

where \(b(0)=0\)x(i) denotes a permutation of the e(i) values such as \(e_{\left( i \right)} \le e_{{\left( {i + 1} \right)}}\) and \(\updelta\) is the weight function assigned to each subset of criteria C = \(\left\{A:Antcipate, M:Monitor,R:Respond, L:Learn\right\}\). The weights assignment expressed by the experts is reported below:

\({\delta }_{0}\left({\varnothing }\right)=0\)

\({\delta }_{8}\left(\left\{\mathrm{R},\mathrm{M}\right\}\right)=0.60\)

\({\delta }_{1}\left(\left\{\mathrm{L}\right\}\right)=0.1\)

\({\delta }_{9}\left(\left\{\mathrm{A},\mathrm{M}\right\}\right)=0.65\)

\({\delta }_{2}\left(\left\{\mathrm{M}\right\}\right)=0.2\)

\({\delta }_{10}\left(\left\{\mathrm{A}, \mathrm{R}\right\}\right)=0.7\)

\({\delta }_{3}\left(\left\{\mathrm{A}\right\}\right)=0.3\)

\({\delta }_{11}\left(\left\{\mathrm{A},\mathrm{M},\mathrm{L}\right\}\right)=0.75\)

\({\delta }_{4}\left(\left\{\mathrm{R}\right\}\right)=0.4\)

\({\delta }_{12}\left(\left\{\mathrm{R}, \mathrm{M}, \mathrm{L}\right\}\right)=0.8\)

\({\delta }_{5}\left(\left\{\mathrm{M},\mathrm{L}\right\}\right)=0.45\)

\({\delta }_{13}\left(\left\{\mathrm{A}, \mathrm{R}, \mathrm{L}\right\}\right)=0.8\)5

\({\delta }_{6}\left(\left\{\mathrm{A}, \mathrm{L}\right\}\right)=0.5\)

\({\delta }_{14}\left(\left\{\mathrm{A}, \mathrm{R}, \mathrm{M}\right\}\right)=0.95\)

\({\delta }_{7}\left(\left\{\mathrm{R},\mathrm{L}\right\}\right)=0.55\)

\({\delta }_{15}\left(\left\{\mathrm{A}, \mathrm{R}, \mathrm{M}, \mathrm{L}\right\}\right)=1\)

14 Testing the ERMG

RESOLUTE carried out various testing activities for the key outputs generated based on the previously described approach. The city of Florence served as the main piloting scenario for the ERMG. The main scenarios defined for the Florence pilot were:

  • River Flood this scenario implies that the river level increases beyond the level of guard and this causes a slowdown in traffic condition in the areas around the flood: both private transportation and public transportation services (like tramway, bus, etc.) suffer a slowdown. Due to these conditions, the traffic controller operator closes part of the road network and modifies the local road traffic. Tram lines that pass by the affected areas are closed. Police and Civil Protection send units in the affected areas assist people in the evacuation procedures.

  • Water Bomb this scenario implies that there is a rapid increase in rainfall that causes a slowdown in traffic conditions: both private and public transportation services (like tramway, bus, etc.) suffer a slowdown. If underpasses are involved, the police send units to monitor the situation and assist the population. Due to these conditions, the traffic controller operator closes part of the road network and modifies the local road traffic in the areas affected by the water bomb.

The data have been collected before and after the pilot execution during dedicated assessment sessions with the involved organisations. No personal data have been collected and/or analysed. Since the score to be assigned to a KPI is collected only at organisational level, a consensus-based approach has been followed during the assessment sessions with the organisation’s operator to assign the final score to each KPI. The result of the assessment in Florence is reported in Table 4. In particular, in Table 4, the following fields are provided:

Table 4 Florence pilot assessment results
  • Output Score before: this field represents the status of the system before the ERMG implementation.

  • Output Score after: this field represents the status of the system after the ERMG implementation.

  • Low KPI causes: it is related to the main causes that affected the resulting low score (see Table 6) after the application of ERMG.

The assessment results are summarised in Table 4 and the computation is reported in Table 5.

Table 5 Result of the Florence assessment

15 Discussion

Operationalisation can be defined as a process of measurement of a phenomenon that is not measurable directly, but can only be inferred by other phenomena. In this respect, in the present paper, the aims was to apply Q-FRAM to assess the “potential” of resilience and use SRI as a proxy indicator to quantify the system resilience in a certain instant. Through this method, the impact of ERMG implementation on the targeted UTS has been evaluated.

Thus, according to the approach presented, an SRI > 1 is typical of a system that cannot exhibit resilient behaviours and so that it is not able to adapt itself in case of changing conditions. In Florence, before the beginning of the project, an SRI of 1.03 hat which is very close to 1 has been calculated; this means that the situation in Florence was not satisfactory in terms of resilience. After the ERMG application, the SRI score was 0.71. Such a good result shows that, globally, the resilience of UTS increased and the system has become capable to reorganie its functions in order to guarantee the main basic services, also in case of a relevant disruption. When the SRI decreases towards 0, the capacity increases. Even if the final value of 0.71 can be considered as satisfactory, it is evident that there are margins for further improvement. Thus, in order to fine tune further the system further , it is worth to analyse in detail, which are the KPIs that have not been positively affected by the adoption of ERMG and which were the causes. In this respect, the ex post analysis has highlighted the following main criticalities related to the ERMG implementation.

For example, if the applicability of the guideline requires resources (could be human resources or physical resources/devices/infrastructure) and the system is lacking these resources, the guideline can’t be applied. A similar case is when within the system there is a lack of knowledge/education/laws that allow to use the guideline.

In particular, looking at Table 4, there are several functions that have obtained a poor score because of one or more implementation criticalities defined in Table 6. For all these functions, there is a great margin of improvement, by simply solving the impediments that prevented the application of the guidelines. Just to make an example, the function “Install/Maintain Assets” obtained a low score because it had not been possible to apply the ERMG within the 3 years of project, as the stakeholders lack the resources to perform the required maintenance (C1). Moreover, the analysis has brought out the human factor as critical point for the success of ERMG implementation. In fact, C5 turned out to be the most common cause of lacking or negligible improvement in the score. Finally, smart technology and data have had a relevant impact at system level since their introduction is the main cause of the significant improvement of MCVI and ACVI.

Table 6 Main issues that can be encountered in the application of a guideline

16 Conclusions

The article aimed at presenting the approach adopted in RESOLUTE project to define the ERMG for generic Critical Infrastructure and its adaptation to UTS. The ERMG has been developed adopting an innovative point of view named “Work-As-Desired” where the system functions, interdependencies and the related recommendations shaped to dampen functional variability are defined in order to obtain a reference model to be used for comparison and assessment. A resilience quantification method, focused on the adaptive capacity aspect, has been used to assess the improvement occurring in the two use cases: Florence and Athens Cities. According to the results, the following conclusions can be drawn:

  • the decomposition of the resilience concept into two sub-concepts as Adaptive Capacity and Coping Ability allowed the definition of a tangible and less arbitrary assessment framework based on evidences.

  • the definition of ERMG must be inclusive (all stakeholders should be involved) as well as prescriptive and detailed in order to reduce the degree of freedom for its interpretation and address all aspects, including the human, technological and organisational factors.

  • the adoption of ERMG has a tangible impact on the UTS resilience as a primary effect and enhances the community awareness about issues usually neglected as a desired side effect (e.g. time for ERMG adaptation).

  • the quantification method based on the 4 resilience cornerstones reveals that not all the decisions are equal, so that, because of the chronic lack of resources, it is possible to prioritise action to maximise the enhancement effect of the system adaptive capacity

  • the operationalization of ERMG requires

    1. (a)

      taking a determined direction towards new technologies and Big Data generated by the smart cities and smart infrastructures since they can enhance the monitoring and control capability in UTS, by improving the granularity and breadth of knowledge and awareness about the infrastructure status and its usage dynamics, continuously collecting Big Data from heterogeneous data sources/streams and sensors from the entire environment,

    2. (b)

      considering the human–user dimension by providing personalised, real-time, context-aware, and ubiquitous advices in order to enhance system's respond capacity,

    3. (c)

      continuously supporting the assessment of UTS resilience in order to identify when the system operates nearer to the safety boundaries.

Resilience design and management for UTS is nowadays a necessity, in order to effectively confront the difficulties arising from the UTS system complexity and exposure to threats, as analysed above. The general principles that should be followed could be summarised in the following:

  • Prevent incidents within control and responsibility, effectively protect critical assets (bridges, tunnels. non-redundant links, links without alternatives).

  • Respond effectively to events that cannot be prevented, mitigate loss and protect employees, passengers and emergency respondents.

  • Support response to events that impact local communities (e.g. supplying of essential goods), integrating equipment and capabilities seamlessly into the total effort.

  • Recover from major events, taking full advantage of the available resources and programs.

The ERMG for UTS as well as the operationalisation tools developed within RESOLUTE aim exactly at facilitating the above aims, thus enhancing the resilience of UTS and—finally—their sustainability within the currently evolving smart cities’ environment.