1 Introduction

This paper focuses on the loss estimation due to business interruption of industrial facilities as a consequence of multiple interacting hazards. Classical risk assessment approaches this problem with the assumption of independence of vulnerability to different hazards (Grünthal et al. 2006; Kameshwar and Padgett 2014; Li and Ellingwood 2009; SwissRe 2013). Such an assumption is valid only when the hazards are independent and mutually exclusive (i.e., the likelihood of combined occurrence is negligible), as in the case of for instance, earthquake and windstorms (Chandler et al. 2001). However, when hazards show spatio-temporal correlations (Gill and Malamud 2014), where multiple hazards are likely to affect the same region over the same time span, or when primary hazards cause secondary perils, the assumption of independence does not hold, and thus, encountered losses become hard to quantify in a reliable manner. During the Fukushima disaster of 2011, for example, a major earthquake triggered a devastating tsunami and the total loss incurred in multi-hazard situations was significantly more than the mere summation of losses from the individual hazards. Hence, there is a need for a reliable multi-peril risk analysis methodology.

The business interruption loss forms a significant portion of the total loss in a catastrophe in today’s highly industrialized and interconnected world. For instance, Hurricane Katrina in 2005 resulted in a total of US$25 billion insured commercial loss, of which business interruption was responsible for US$6–9 billion. Of the total estimated loss of US$97 billion during Superstorm Sandy (2012), US$10–16 billion was attributed to business interruption due to widespread power outages and flooding (Kunz et al. 2013). Power outages for up to 14 days left about 370,000 customers without power as affected several other lifelines and businesses (USDOE 2013). Shut down of local refineries led to a 30–40% reduction in the region’s total fuel production as well as another 20–25% reduced supply capacity due to harbor downtime. The transportation sector was also significantly affected with several tunnels being inundated for days (Haraguchi and Kim 2014). Recently, in August 2017, after Hurricane Harvey made landfall near Galveston, TX, several Gulf Coast refineries, amounting to over 30% of Gulf Coast refining capacity, were shut down for over a week resulting in heavy business interruption losses to the oil industry. Despite constituting a significant portion of the losses, business interruption loss estimation methodologies have not received sufficient research attention.

Calculating business interruption in multi-hazard scenarios is a challenge in itself. Interactions among multiple natural and technical hazards affecting a region have been extensively studied in the past (Gill and Malamud 2014; Kappes et al. 2012; Marzocchi et al. 2012; Javanbarg et al. 2009). In the context of multi-hazard analysis, primary hazards refer to those that occur independently of other hazards, while secondary hazards, or perils, are triggered by the primary hazards. Such triggering interactions make the risk estimation problem complicated for three reasons. First, the system component’s vulnerabilities differ based not only on the type of hazard but also on the hazard intensity. Second, restoration times also vary for different levels of vulnerabilities for different hazards. Third, the component’s structural capacity varies depending on the previous exposure of the components to the primary hazard. As a prominent example, aftershocks of an earthquake amplify damages to a system’s component. This is because some of the component’s structural inventory has already been weakened by the initial shake damage, thus increasing the component’s vulnerability to aftershocks (Ryu et al. 2011).

Although interactions between hazards have been widely studied, the risk estimation problem in multi-hazard situations is particularly challenging for complex systems with various interdependent components each of which may experience multiple damage states. Industrial facilities are examples of such complex systems, where each critical component of the facility must be carefully accounted for to assess overall risk of the system as a whole. Precisely, such analysis can be achieved using fault tree analysis which decomposes a complex system into its components, considering the components’ functional interdependencies through Boolean logic. Here, interdependency refers to the fact that the functioning of some components relies on adequate functioning of other components. In this paper, we leave the interdependency of damageability out of scope where failure of one component can cause failure of another component. With the fault tree approach, each component is assigned a failure probability which, for a downtime estimation problem, involves a combination of probability of a damage state and probability of restoration. For each hazard type, probability of a damage state is represented by fragility curves as a function of hazard intensity, and the probability of restoration is represented by restoration curves as a function of time elapsed after the event.

Given the uncertainties in estimating the response of engineering systems to natural and manmade hazards, and the importance of this response for estimating damage, it is useful to represent the damage probabilistically using fragility curves. Typically, fragility curves are developed considering a single hazard, although a significant body of the literature focuses on developing damage functions considering combined action of multiple hazards and perils. Alipour et al. (2012), Wang et al. (2012), Dong et al. (2013), and Prasad and Banerjee (2013) have developed multi-hazard fragility curves that yield the failure probability of bridges due to seismic excitation conditional on the level of pier scour based on nonlinear dynamic finite element model simulations. Gehl and D’Alaya (2016) have derived fragility curves for bridges using a Bayesian network models for combined action of earthquake, ground failure, and floods. Several studies have produced seismic fragility curves conditional on the level of aging in the buildings (Ghosh and Padgett 2010; Choe et al. 2009; Alipour et al. 2010). Fragility curves for earthquake mainshock–aftershocks sequence have also been developed (Ryu et al. 2011; Li et al. 2014; Mackie and Stojadinovic 2004).

The multi-hazard fragility curves (or, more aptly, fragility surfaces), discussed above, are developed using computational structural modeling and simulations of multiple hazard scenarios, typically using nonlinear dynamic finite element analysis. Such an analysis becomes cumbersome for a complex industrial facility with several hundreds of components. In this paper, we present a generalized methodology for downtime estimation under multi-hazard scenarios that uses single-hazard fragility curves (also called marginal fragility curves) for system components, which are more readily available or easier to derive. The methodology is applicable to cascading hazards, where primary hazards trigger secondary hazards (e.g., earthquake triggering a tsunami), and concurrent hazards, where multiple consequences of the same hazard occur simultaneously (e.g., wind and storm surge during hurricanes). The methodology is also applicable to any system with multiple interdependent components such as utility networks, infrastructure networks, and commercial supply chains.

The paper is organized as follows. In Sect. 2, we introduce the generally applicable downtime estimation methodology under multi-hazard scenario. This methodology is demonstrated based on a case study of a power plant, followed by results of multi-hazard risk analysis, in Sect. 3. Concluding remarks are given in Sect. 4.

2 Downtime estimation methodology using fault tree analysis

Studies on multi-hazard risk assessment have developed multi-hazard fragility curves by fitting multi-dimensional failure probability distributions using statistical methods. However, such analysis for estimating combined multi-hazard damage probability is computationally prohibitive for large systems with several components, as each such analysis relies on an ensemble of computational nonlinear structural simulations. Thus, the use of marginal fragility curves for individual hazards offers a convenient alternative for quantitative risk assessment since such marginal fragilities are often computable through numerical analysis, empirical analysis, or expert opinion. Here, we describe a downtime estimation methodology that uses marginal fragility curves which are combined using a Boolean logic-based process.

To effectively account for the damage potential of secondary perils given the occurrence of a primary hazard, risk assessment methodologies must consider both interdependencies in hazards as well as conditionality of damage probabilities (Marzocchi et al. 2012; Mignan et al. 2014). In the case of mutually exclusive multiple hazards, the failure probability of a component of the facility can be expressed as:

$$P\left( F \right) = \mathop \sum \limits_{h = 1}^{{n_{h} }} P\left( {F |A_{h} } \right)P\left( {A_{h} } \right)$$
(1)

where \(A_{h}\) represents the \(n_{h}\) multiple mutually exclusive hazards that do not occur concurrently. \(P(F|A)\) is the marginal failure probability of the component conditional on a single hazard \(A\). When the multiple hazards \(A_{1}\) and \(A_{2}\) occur at the site in the same time frame, the conditional failure probability \(P\left( {F |(A_{1} ,A_{2} )} \right)\) is a more accurate representation of system fragility.

Consider a system with \(n_{c}\) components which is susceptible to \(n_{h}\) hazards that be cascading (e.g., earthquake followed by tsunami) or concurrent (e.g., wind and storm surge during hurricane). Failure probability of component \(i\) due to hazard \(h\) with intensity \(s_{h}\) can be expressed as:

$$\begin{aligned} P_{h,\left( d \right)}^{\left( i \right)} \left( {s_{h} ,t} \right) & = F_{h,\left( d \right)}^{\left( i \right)} \left( {s_{h} } \right)\left( {1 - G_{h,\left( d \right)}^{\left( i \right)} \left( t \right)} \right) \\ &\quad i = 1, \ldots ,n_{c} ;\quad h = 1, \ldots ,n_{h} ;\quad d = 1, \ldots ,n_{d} \\ \end{aligned}$$
(2)

where \(d\) represents the damage state, \(F_{h,\left( d \right)}^{\left( i \right)} \left( {s_{h} } \right)\) is the fragility curve, and \(G_{h,\left( d \right)}^{\left( i \right)} \left( t \right)\) is the restoration curve for hazard \(h\) and damage state \(d\).

If damage states are considered to be incremental (e.g., slight, moderate, heavy, etc.), we can assume that each component can exist in only one damage state following the hazardous event. The multiple incremental damage states \(d = 1, \ldots ,n_{d}\) can be combined using Boolean OR logic to obtain the component’s multi-hazard probability:

$$\begin{aligned} P_{h}^{\left( i \right)} \left( {s_{h} ,t} \right) & = 1 - \mathop \prod \limits_{d = 1}^{{n_{d} }} \left( {1 - F_{h,\left( d \right)}^{\left( i \right)} \left( {s_{h} } \right)\left( {1 - G_{h,\left( d \right)}^{\left( i \right)} \left( t \right)} \right)} \right) \\ &\quad i = 1, \ldots ,n_{c} ;\quad h = 1, \ldots ,n_{h} \\ \end{aligned}$$
(3)

Here, we assume that the component failures or damage is not interdependent, which implies that the failure of one component does not alter the failure probability of another component, regardless of their connectivity or spatial proximity. Such an assumption is valid in a system where a component failure typically does not cause another component to fail. However, this assumption may not be valid for, say, power networks where failure of substation causes overloads that can cause additional component failures. The use of Bayesian networks (Doguc and Ramirez-Marquez 2009; Boudali and Dugan 2005) instead of fault tree can account for such dependencies through conditional failure probabilities.

When multiple hazards occur in the same time frame, we replace the single-hazard fragility \(F_{h,\left( d \right)}^{\left( i \right)} \left( {s_{h} } \right)\) with a multi-hazard conditional fragility \(F_{{1, \ldots ,n_{h} ,\left( d \right)}}^{\left( i \right)} \left( {s_{1} , \ldots ,s_{{n_{h} }} } \right)\), where \(s_{1} , \ldots ,s_{{n_{h} }}\) are the intensities of hazards \(h = 1, \ldots ,n_{h}\). Similarly, we replace the restoration curve with a conditional relation \(G_{{1, \ldots ,n_{h} ,\left( d \right)}}^{\left( i \right)} \left( t \right)\). The expression for the component failure probability now becomes:

$$\begin{aligned} P_{{1, \ldots , n_{h} ,\left( d \right)}}^{\left( i \right)} \left( {s_{1} , \ldots ,s_{{n_{h} }} ,t} \right) & = F_{{1, \ldots ,n_{h} ,\left( d \right)}}^{\left( i \right)} \left( {s_{1} , \ldots ,s_{{n_{h} }} } \right)\left( {1 - G_{{1, \ldots ,n_{h} ,\left( d \right)}}^{\left( i \right)} \left( t \right)} \right) \\ &\quad i = 1, \ldots ,n_{c} ; d = 1, \ldots ,n_{d} \\ \end{aligned}$$
(4)

The component damage characteristics often vary for different hazards, although some overlap might be expected. For instance, in case of a tsunami following an earthquake, both hazards will result in structural damage, although tsunami inundation would likely cause significantly more damage than earthquake to electrical equipment. The characteristics of the structural damage itself are also distinct depending on the hazard. Taking storage tanks as an example, while earthquake may lead to buckling or sloshing damage, tsunami inundation can lead to undermining of foundation resulting in anchorage failure and displacement of the tank. Therefore, from a multi-hazard perspective, it is important to find a common basis for categorization of damage states across the multiple hazards. Defining damage states based on the repair time, instead of the structural damage, offers such a common basis. As an example, one may assume a ‘low’ damage state if the repair time is expected to be less than 10 days, ‘moderate’ if more than 10 days, ‘severe’ if more than 30 days, and so on. Thus, using the repair time as the basis for assigning the damage states allows us to combine the component failure probabilities from different hazards seamlessly. With the restoration curve compressed as \(G_{{1, \ldots ,n_{h} ,\left( d \right)}}^{\left( i \right)} \left( t \right) = G_{\left( d \right)}^{\left( i \right)} \left( t \right)\), we can now combine the damage states as:

$$\begin{aligned} P_{{1, \ldots ,n_{h} }}^{\left( i \right)} \left( {s_{1} , \ldots ,s_{{n_{h} }} ,t} \right) & = 1 - \mathop \prod \limits_{d = 1}^{{n_{d} }} \left( {1 - F_{{1, \ldots ,n_{h} ,\left( d \right)}}^{\left( i \right)} \left( {s_{1} , \ldots ,s_{{n_{h} }} } \right)\left( {1 - G_{\left( d \right)}^{\left( i \right)} \left( t \right)} \right)} \right) \\ &\quad i = 1, \ldots ,n_{c} \\ \end{aligned}$$
(5)

The following section will discuss the estimation of the multi-dimensional fragility surfaces \(F_{{1, \ldots ,n_{h} ,\left( d \right)}}^{\left( i \right)} \left( {s_{1} , \ldots ,s_{{n_{h} }} } \right)\) for the components and calculation of system downtime.

2.1 Development of multi-dimensional fragility surfaces

Consider two hazards \(A_{1}\) and \(A_{2}\) which can be thought of as cascading or concurrent hazards affecting a system component. Assuming that the component has not undergone any repairs during the occurrence interval of the two hazards, let us also consider four incremental damage states to represent the component state as \(d_{h,d}\) with \(h = 1,2; d = 0,1,2,3\), where \(d = 0\) represents no damage.

We can form a matrix of damage states that the component can assume given the occurrence of \(h_{1}\) and \(h_{2}\) in Table 1.

Table 1 Failure probabilities for different combinations of hazards and damage states

The probability of component being in the damage state \(d = 2\) is the OR combination of the states where \(d = 2\) is the highest damage state, i.e., \(\left[ {\left( {d_{0} ,d_{2} } \right),\left( {d_{1} ,d_{2} } \right),\left( {d_{2} ,d_{0} } \right),\left( {d_{2} ,d_{1} } \right),\left( {d_{2} ,d_{2} } \right)} \right]\), and expressed generally as:

$$F_{{1, \ldots ,n_{h} ,\left( {d_{1} , \ldots ,d_{{n_{h} }} } \right)}}^{\left( i \right)} \left( {s_{1} , \ldots ,s_{{n_{h} }} } \right) = \mathop \prod \limits_{h = 1}^{{n_{h} }} F_{{h,\left( {d_{h} } \right)}}^{\left( i \right)} \left( {s_{h} } \right); i = 1, \ldots ,n_{c}$$
(6)
$$\begin{aligned} F_{{1, \ldots ,n_{h} ,\left( d \right)}}^{\left( i \right)} \left( {s_{1} , \ldots ,s_{{n_{h} }} } \right) & = 1 - \mathop \prod \limits_{{\hbox{max} \left( {d_{1} , \ldots ,d_{{n_{h} }} } \right) = d}} \left( {1 - F_{{1, \ldots ,n_{h} ,\left( {d_{1} , \ldots ,d_{{n_{h} }} } \right)}}^{\left( i \right)} \left( {s_{1} , \ldots ,s_{{n_{h} }} } \right)} \right) \\ &\quad i = 1, \ldots ,n_{c} ; \quad d = 1, \ldots ,n_{d} \\ \end{aligned}$$
(7)

Typically, the marginal fragility curves are lognormal distributions, which maintain their lognormal nature upon multiplication. Thus, the resulting failure probability (left-hand side of Eq. 7) is also a lognormal distribution.

This process of estimating the multi-hazard fragility is based on an assumption that the damage caused by the multiple hazards is statistically independent. This assumption may not hold true for certain hazards, especially those of a similar nature (e.g., earthquake mainshock–aftershock). To address this, one can calibrate the derived multi-dimensional fragility surfaces upon collecting empirical or analytical data regarding the component’s performance in multi-hazard scenarios.

2.2 Calculation of system downtime

Using Eq. 5, each component’s failure probability is calculated, which can now be combined to compute system-level failure probability using the fault tree model. In standard fault tree terminology, the term ‘basic event’ refers to component failure, while ‘top event’ refers to the system failure. Here, the top event of the fault tree is that the system will not be restored before time \(t\) given occurrence of \(n_{h}\) events of intensities \(s_{1} , \ldots ,s_{{n_{h} }}\). Let \({\mathcal{F}}\) denote the fault tree model, with inputs including the hazard intensities \(s_{1} , \ldots ,s_{{n_{h} }}\), elapsed time \(t\), the marginal fragility \(F_{h,\left( d \right)}^{\left( i \right)}\) of each component \(i = 1, \ldots ,n_{c}\), for each hazard \(h = 1, \ldots ,n_{h}\) and damage states \(d = 1, \ldots ,n_{d}\), and the restoration curve \(G_{\left( d \right)}^{\left( i \right)}\) of each component \(i = 1, \ldots ,n_{c}\) for damage states \(d = 1, \ldots ,n_{d}\). The top event probability of not being repaired in time \(t\) is then:

$$P^{{\left( {top} \right)}} \left( {t |s_{1} , \ldots ,s_{{n_{h} }} } \right) = {\mathcal{F}}\left( {P_{{1, \ldots ,n_{h} }}^{\left( 1 \right)} \left( {s_{1} , \ldots ,s_{{n_{h} }} ,t} \right), \ldots ,P_{{1, \ldots ,n_{h} }}^{{\left( {n_{c} } \right)}} \left( {s_{1} , \ldots ,s_{{n_{h} }} ,t} \right)} \right)$$
(8)

Mean \(E\left( \cdot \right)\) and variance \({\text{Var}}\left( \cdot \right)\) of the downtime are obtained as:

$$p^{{\left( {\text{top}} \right)}} \left( {t|s_{1} , \ldots ,s_{{n_{h} }} } \right) = \frac{{{\text{d}}\left( {P^{{\left( {\text{top}} \right)}} \left( {t|s_{1} , \ldots ,s_{{n_{h} }} } \right)} \right)}}{{{\text{d}}t}}$$
(9)
$$E\left( {s_{1} , \ldots ,s_{{n_{h} }} } \right) = \mathop \smallint \limits_{0}^{\infty } t \cdot p^{{\left( {\text{top}} \right)}} \left( {t|s_{1} , \ldots ,s_{{n_{h} }} } \right){\text{d}}t$$
(10)
$${\text{Var}}\left( {s_{1} , \ldots ,s_{{n_{h} }} } \right) = \mathop \smallint \limits_{0}^{\infty } t^{2} \cdot p^{{\left( {\text{top}} \right)}} \left( {t|s_{1} , \ldots ,s_{{n_{h} }} } \right){\text{d}}t - \left[ {E\left( {s_{1} , \ldots ,s_{{n_{h} }} } \right)} \right]^{2}$$
(11)

If we use lognormal distributions to represent fragility and restoration curves, the lower bound of the mean estimate in Eq. 10 becomes \(0\) days.

3 Case-study demonstration

Our case study is a hypothetical power plant that is closely modeled after an existing coal-fired power plant located on the coast of Chile, which is vulnerable to both earthquake and tsunami hazards. Following sections describe the plant layout, major earthquake and tsunami vulnerabilities, fragility and restoration parameters, development of multi-hazard fragility curves, estimation of earthquake and tsunami hazard, and finally estimation of plant’s downtime and risk analysis with uncertainty quantification due to the various sources of variability.

3.1 Plant description

The case-study power plant has two power generation units and a coal-offloading pier. The fault tree model of the plant is generated after several site visits, studying blueprints, and consulting with the plant engineers. The coal-offloading pier consists of a reinforced concrete deck supported on steel wide flange girders and hollow steel tubes piles. Coal is offloaded from ships docked at the pier using a clamshell crane supported by the deck and transported to a coal yard by a conveyor system. Another conveyor system transports coal from the coal yard to silos at the two power generation units. Both power generation units have a nearly identical layouts and process flows. The coal silos are welded steel cylindrical silos with conical bottoms that feed into a coal pulverizer that crushes the coal to a powdered form. The powdered coal is then burned in the boiler which is supported on a diagonally braced steel frame. Both units also consist of their own reinforced concrete exhaust stack, along with other exhaust system equipment such as forced draft fans, air ducts, and bottom ash handling systems. Other flue-gas treatment components include desulphurization systems, fabric filters, electrostatic precipitators, and spray dry absorbers.

The cooling water intake siphons are above ground large-diameter welded steel pipes that are supported by the offloading pier. The pier thus supports the offloading crane, conveyor system, and cooling water intake siphons, making it a critical component for plant operation. The outflow pipes are buried large-diameter reinforced concrete. The plant also consists of several large-diameter atmospheric tanks for a variety of processes. The two units consist of three service water tanks, two fuel tanks, and three condensate tanks with varying levels of anchorage. The turbine building houses two turbines, generator, and other electrical equipment such as switchgears, battery racks, and emergency generator for restarting the plant. The turbine building itself is a steel braced frame structure with sheet metal walls and roof.

We identified a total of 118 independent components of the power plant which will form the fault tree model of the plant. As a general rule, components that fail simultaneously or share the same support system are grouped into one component. For instance, a pump attached to a tank is redundant if the tank is non-functional, thus making it sensible to treat the tank as one component including the pump. The fault tree model of the plant is shown in Fig. 1. OR gates are used to represent critical components, without which the plant cannot function, while AND gates are used to represent redundancy of components where plant can function, while the some of the components are temporary under repair.

Fig. 1
figure 1

Fault tree model of the hypothetical case-study power plant

In this demonstration, the restoration functions for each of the components in the plant are based on engineering judgment of the authors from past experience as well as communications with the plant authorities. These restoration functions can be improved in the light of additional empirical data from future disasters. The restoration functions take into account the availability of spare parts on site, time required to order and ship parts or entire components, and post-disaster labor availability. These functions can be updated based on empirical evidence. FEMA’s HAZUS tool provides such empirical restoration curves primarily for residential and commercial building components. The parameters of the lognormal distributions used in this case study to represent restoration functions are given in Table 2.

Table 2 Restoration function parameters for power plant components. \(\mu\) and \(\sigma\) are the median and standard deviation in days

3.2 Seismic vulnerability

In a probabilistic context, seismic vulnerability is expressed with seismic fragility curves which are cumulative probability distribution functions that give the probability of exceeding a given discrete damage state as a function of a seismic intensity measure. Fragility curves account for the many sources of aleatory uncertainty in estimating the structural capacity under seismic ground motion, including variations in the ground-motion time histories corresponding to an intensity measure, as well as variations in structural properties (foundation, geometry, material properties, live loads, etc.) that govern the capacity curve and demand spectrum. In this paper, we chose the peak ground acceleration (PGA) as the seismic intensity measure, since majority of the equipment in the power plant is ground based, with the exception of boiler tower and smoke stack. Moreover, majority of sources report fragility curves in terms of PGA (ALA 2001; EPRI 2013; Johnson et al. 1999). Typically, the discrete damage states’ definitions are based on the structural condition of the component. However, as discussed in Sect. 2.1, we have defined the damage states based on the post-disaster recovery time of the component.

The major seismic vulnerabilities identified at the case-study power plant that are expected to result in significant downtime are as follows:

  1. 1.

    Damage to the support systems of components such as boiler, piping, and other mounted mechanical and electro-mechanical equipment, although non-obstructive to the continued operation of the component, presents a site safety hazard. Failure of support systems or large inelastic deformations, on the other hand, renders the component temporarily non-functional.

  2. 2.

    Damage to mechanical components varies based on their type and size. Components with moving parts can undergo internal warping or displacements. Unanchored or marginally anchored equipment can overturn causing heavy damage, requiring replacement.

  3. 3.

    Damage to the atmospheric steel tanks occurs in several modes (ALA 2001), the most common being the outward shell buckling mode, also known as ‘elephant foot’ bucking of the tank wall (Malhotra et al. 2000). Anchorage failure caused by breakage, pull-out, or stretching results in base uplift or sliding of the tank. Movement of tanks may cause the connecting pipes to break off from the tank.

  4. 4.

    The coal-offloading pier supports an offloading clamshell crane, water siphon pipeline, and a coal conveyor. Shake damage to the reinforced concrete piles and deck can manifest in form of cracking or collapse of the pier.

  5. 5.

    Coal conveyor from the offloading pier to the coal yard and from the coal yard to the coal silos is supported by truss girder system. Ground motion can result in member buckling, over tipping, or permanent deformation of the slender truss system.

  6. 6.

    The reinforced concrete smoke stack can sustain cracking that can be a safety issue until repaired. Moreover, since the smoke stack rests on a mat foundation, rocking is expected, damaging the connected air ducts and other flue-gas treatment equipment.

The above on-site vulnerabilities are non-exhaustive, but are of the highest concern from a business interruption perspective, as established from site survey, communication with site management and engineering judgment. Although ground failure by liquefaction also has significant potential for structural damage to several of the plant components (Suzuki 2008; Kazama and Noda 2012), it is out of the scope of this study. The parameters of the lognormal distribution used to represent fragility function for each component are given in Table 3.

Table 3 Fragility curve parameters for power plant components

3.3 Tsunami vulnerability

Tsunamis are generated by sudden deformation of the ocean floor due to tectonic activity which displaces a large volume of water (Haugen et al. 2005). The resulting high-energy tsunami waves swell and grow in height as they approach the shallow coast inundating the near-shore region. Typically, tsunamis carry a large amount of debris that increases the damage potential of the high-velocity waves. Although tsunamis are also formed by undersea volcanic eruptions, landslide, and explosions, the ones caused by earthquakes have historically been most frequent and most damaging.

Tsunami fragility curves of components give the damage probability as a function of a tsunami intensity measure that defines the flow severity characteristics. Although most tsunami fragility curves use inundation depths as the intensity measure, alternatives measures include wave velocity, hydrodynamic force, hydrostatic force, momentum flux, moment of momentum flux, and energy head. Inundation depth is most commonly used because of the availability of this data from post-event reconnaissance, while other intensity measures require inundation modeling through computational simulations (Macabuag et al. 2016). Based on observed damage in Banda Aceh, Indonesia, from 2004 Indian Ocean tsunami, Koshimura et al. (Koshimura et al. 2009) have identified inundation depth as the most reliable tsunami intensity measure due to the difficulty in accurate estimation of the other intensity measures.

Major tsunami vulnerabilities were identified at the case-study power plant based on site visits and consulting with the plant engineers. These are listed as follows:

  1. 1.

    Hydrostatic forces due to inundation (water depth) and hydrodynamic forces due to high-velocity water flow induce lateral forces on structures that lead to structural damage and overturning.

  2. 2.

    Inundation also leads to vertical uplift or buoyancy forces causing displacement of structures that are not sufficiently anchored, especially tanks (Krausmann and Cruz 2013; Nishi 2012). Tank displacement also damages attached pipes.

  3. 3.

    Scouring of foundations, roads, vegetation, and piers can occur during both inundation and recession of the tsunami waves (Francis 2006).

  4. 4.

    The plant’s coal-offloading pier can experience scouring of seabed resulting in differential settlement of the supports. Water inundation and flow result in vertical buoyancy forces which can dislocate the pier decks if the connections with the supports are not adequate. This is apparent when comparing the wharf damage in Southeast Asia from the 2004 Indian Ocean tsunami and wharf damage in Japan from the 2011 Tohoku tsunami, in that Japanese wharfs sustained far lesser damage due to better connection design and seismic strengthening.

  5. 5.

    Impact of floating and flowing debris, including cars and boats, during both wave inundation and recession results in heavy structural damage.

  6. 6.

    Saltwater intrusion is a serious problem for non-structural damage to electrical equipment and fixtures such as transformers and control panels, as well as heavy machinery such as turbines and generators. Water intrusion damage to transformers was one of the primary causes of blackouts during Hurricane Sandy (Boggess et al. 2014).

  7. 7.

    Breaching of fuel and chemical tanks, broken fuel pipelines, and overflow of sewage results in secondary damage to property caused by toxic release, fire, and vapor cloud explosions. Although secondary technical hazards are not considered here, they can cause significant business interruption due to clean up activities.

Based on the study of the plant layout, engineering reports, and site visits, as well as published literature (Krausmann and Cruz 2013; Basco and Salzano 2016; Hatayama 2014; Landucci et al. 2014; Horspool and Fraser 2016), we estimated the fragility curves of the identified components of the plant, as given in Table 4. It is important to note that a majority of the tsunami fragility curves used here are based on informed expert opinion and are not obtained through empirical data analysis or computational simulations. Therefore, the risk analysis would be incomplete if uncertainty in these parameter values is not considered in calculating the system downtime. Uncertainty quantification is discussed in Sect. 3.8.

Table 4 Tsunami fragility curves for power plant components

3.4 Multi-hazard vulnerability

Using Eq. 7, the combined fragility of each component is calculated for a range of PGA and tsunami inundation height values. An example joint fragility surfaces for the smoke stack and offloading pier, two of the most critical components, are shown in Fig. 2. As evident from Fig. 2, the damage probability is amplified when shake and tsunami occur together. The edges of the fragility surface, corresponding to the PGA and tsunami height axes, represent the components’ marginal shake and tsunami fragilities, respectively. While the marginal fragilities are based on either computational simulation or empirical data analysis, the conditional damage probabilities (values not on the horizontal axes) are calculated based on Boolean logical combination of the marginal fragilities.

Fig. 2
figure 2

Combined earthquake and tsunami fragility surface for the smoke stack and offloading pier, showing medium damage exceedance probabilities

From a disruption perspective, the downtime of a facility is governed not only by the functioning of the facility’s internal components, but also by external factors that affect the supply of raw material, transmission/transportation of finished products, or access to employees. These external factors that directly affect the facility downtime must be considered as components in the fault tree model. In the power plant example, the components that are external to the plant include the coal-offloading pier, coal-offloading clamshell crane, external power supply, and transmission lines. Although the plant’s management does not own these components, they directly affect the plant’s downtime fragility functions, and restoration functions for the above components are provided in Tables 2, 3 and 4.

3.5 Earthquake hazard

Due to its proximity to the Nazca subduction zone, Chile is one of the most seismically active countries. Located on the South American plate, the two dominant sources of seismicity are the subduction of the Nazca plate and Antarctic plate under the South American plate at the rate of approximately 80 mm/year (Somoza and Ghidella 2005). The magnitude 9.6 earthquake that occurred along the Peru–Chile trench on this interface in 1960 is the largest magnitude recorded instrumentally (Krawcyzk 2003). A magnitude 8 earthquake has a return period of 80–130 years for any given region in Chile (Barrientos et al. 2004). Return period is the expected reoccurrence time interval of a given magnitude of hazard.

To calculate the seismic hazard at the site of the power plant, we used the seismogenic source model developed by Global Earthquake Model’s South America Risk Assessment (GEM-SARA) project (GEM 2017). The source model consists of the geometric parameters, kinematics, activity rates of active crustal faults, and subduction zone in the South American countries. The project has also selected a ground-motion prediction equation logic trees for active shallow crustal, stable shallow crustal, subduction in-slab, and subduction interface regions found in GEM (2017).

Using GEM’s OpenQuake, an open source seismic risk assessment package, we performed probabilistic seismic hazard assessment (PSHA) at time scales of 10,000 years. The goal of PSHA is to find the total probability of exceedance of earthquake magnitude \(X\) (e.g., moment magnitude \(M_{w}\)) in a given time span \(T\) considering all seismic sources in a region (Pagani et al. 2014), expressed as.

$$\begin{aligned} P\left( {X \ge x |T} \right) & = 1 - P_{{{\text{sr}}c_{1} }} \left( {X < x |T} \right)*P_{{{\text{sr}}c_{2} }} \left( {X < x |T} \right)* \cdots *P_{{{\text{sr}}c_{I} }} \left( {X < x |T} \right) \\ & = 1 - \mathop \prod \limits_{i = 1}^{I} P_{{{\text{sr}}c_{i} }} \left( {X\left\langle x \right|T} \right) \\ \end{aligned}$$
(12)

where \(P_{{{\text{sr}}c_{i} }} \left( {X\left\langle x \right|T} \right)\) is the probability that the \(i\)th source does not generate ground-motion parameter greater than \(x\) in the time span \(T\) and \(I\) is the number of sources. Typical ground-motion parameters are peak ground acceleration or spectral acceleration at a given site. The sources are assumed independent, which means that the occurrence of earthquake in one source does not modify the probability of occurrence in another source. Next, a set of earthquake ruptures in each source are generated using specified occurrence probabilities in the time span \(T\) using an earthquake rupture forecast model. The set of ruptures can be thought of as the various discretized rupture surfaces and magnitudes in each source. Thus, the probability of exceedance for each source \(i\) is:

$$\begin{aligned} P_{{{\text{sr}}c_{i} }} \left( {X < x |T} \right) & = P_{{{\text{rup}}_{i,1} }} \left( {X\left\langle x \right|T} \right)*P_{{{\text{rup}}_{i,2} }} \left( {X\left\langle x \right|T} \right)* \cdots *P_{{{\text{rup}}_{{i,J_{i} }} }} \left( {X\left\langle x \right|T} \right) \\ & = \mathop \prod \limits_{j = 1}^{{J_{i} }} P_{{{\text{rup}}_{i,j} }} \left( {X\left\langle x \right|T} \right) \\ \end{aligned}$$
(13)

where \(P_{{{\text{rup}}_{i,j} }} \left( {X\left\langle x \right|T} \right)\) is the probability that the \(j\)th rupture in source \(i\) has a magnitude that does not exceed \(x\) in time span \(T\) and \(J_{i}\) is the total number of ruptures in source \(i\). The ruptures are mutually exclusive, which means that rupture in one source does not affect the probability of another rupture in the same source. Thus, we can express the non-exceedance probability of each rupture in source \(i\) as:

$$\begin{aligned} P_{{{\text{rup}}_{i,j} }} \left( {X < x |T} \right) & = P_{{{\text{rup}}_{i,j} }} \left( {n = 0 |T} \right) + P_{{{\text{rup}}_{i,j} }} \left( {n = 1 |T} \right)*P\left( {X < x | {\text{rup}}_{i,j} } \right) + P_{{{\text{rup}}_{i,j} }} \left( {n = 2 |T} \right)*P\left( {X < x | {\text{rup}}_{i,j} } \right)^{2} + \cdots \\ & = \mathop \sum \limits_{k = 0}^{\infty } P_{{{\text{rup}}_{i,j} }} \left( {n = k |T} \right)*P\left( {X < x | {\text{rup}}_{i,j} } \right)^{k} \\ \end{aligned}$$
(14)

where \(P_{{{\text{rup}}_{i,j} }} \left( {n = k |T} \right)\) is the probability that the \(j\)th rupture occurs at \(k\)-times in time span \(T\), and \(P\left( {X < x | {\text{rup}}_{i,j} } \right)\) is the probability that the occurrence of rupture \({\text{rup}}_{i,j}\) does not exceed level \(x\). Assume that ruptures follow a Poisson temporal occurrence model:

$$P_{{{\text{rup}}_{i,j} }} \left( {n = k |T} \right) = e^{{\nu_{ij} T}} \frac{{\nu_{ij} T}}{k!}$$
(15)

Combining Eqs. 1215, we can express the probability of the ground-motion parameter \(X\) exceeding a level \(x\) at least once in time \(T\) as:

$$P\left( {X \ge x |T} \right) = 1 - \mathop \prod \limits_{i = 1}^{I} \mathop \prod \limits_{j = 1}^{{J_{i} }} \mathop \sum \limits_{k = 0}^{\infty } P_{{{\text{rup}}_{i,j} }} \left( {n = k |T} \right)*P\left( {X < x | {\text{rup}}_{i,j} } \right)^{k}$$
(16)

This process of developing the exceedance probability relation, also known as hazard curve, is known as classical PSHA and follows the procedure suggested by Cornell (1968) and formulated in Field et al. (2003). An alternative to the classical approach is the event-based PSHA (EPSHA), in which ruptures are generated by Monte Carlo simulation, sampling from \(P_{{{\text{rup}}_{i,j} }} \left( {n = k |T} \right)\), to create a set of events in a given time span, also known as a stochastic event set. The corresponding ground-motion field, which is the list of ground-motion intensities at a given site, is calculated to obtain the hazard curve. In this study, the EPSHA is used to obtain a 10,000-year stochastic event set. Note that the 10,000-year stochastic event set does not represent 10,000-years of seismic activity, but instead gives 10,000 realizations of a single year of earthquake activity.

The ground-motion intensity corresponding to a rupture is predicted by ground-motion prediction equations (GMPEs) which relate site intensity to rupture parameters such as earthquake magnitude, distance of site to hypocenter, local soil parameters, and fault mechanism (Douglas 2003). GMPEs are typically empirically derived using historical instrumental data and vary with the source typologies, mainly including four categories: interface and intraslab subduction regions, and active and stable shallow crustal regions. Selection of GMPEs for a particular region is not only a difficult task given the large number of options to choose from, but also important given the strong sensitivity of hazard predictions to the chosen GMPEs (Stewart et al. 2013). Given the shortage of instrumental data from one specific region, the development of GMPEs relies on data from similar ruptures from other regions. It is thus rare that a single GMPE can accurately predict ground-motion intensity for all future earthquakes affecting a site. To deal with this epistemic uncertainty related to the choice on GMPEs, OpenQuake uses a logic tree approach where multiple GMPEs are selected with corresponding weights that represent the confidence of the modelers. The final hazard prediction is thus a weighted sum of individual logic tree path predictions.

The GEM-SARA project has identified the GMPEs that best fit the ground-motion data in South America for the four different source typologies along with their logic tree weights (Drouet et al. 2017), which are given in Table 5. Figure 3 shows the mean local hazard curve as well as the ensemble of hazard curves obtained from sampling from a uniform distribution between the maximum and minimum PGA values obtained from the logic tree combinations. The 500-year and 2500-year return period PGA distributions are shown in Fig. 4. For comparison, the figure also shows the 500- and 2500-year PGA near the power plant location estimated by USGS in 2010 (Petersen et al. 2010), which are 0.724 g and 1.33 g, respectively.

Table 5 GMPEs used in the classical PSHA along with logic tree weights
Fig. 3
figure 3

Ensemble of local seismic hazard curves obtained from EPSHA

Fig. 4
figure 4

Distributions of 500- and 2500-year PGA obtained by resampling from the OpenQuake EPSHA output. The dotted line shows estimates by USGS (2010)

3.6 Tsunami hazard

The tsunami hazard at the site of the power plant is represented by assigning a runup height to each earthquake in the 10,000-year stochastic event set. Several studies have reported scaling relation between tsunami intensity and earthquake magnitude (Kulikov et al. 2005; Murty 1977; Abe 1995; Silgado 1978; Comer 1980; Geist and Parsons 2016; Geist 2012; Suppasri et al. 2013; Iida 1983), in lieu of complicated and computationally intensive hydrodynamic fluid dynamics simulations which require accurate near-shore bathymetry data and earthquake source parameters. Thus, the advantage of using empirical relations lies not only in the reduced computational effort, but also in avoiding making highly uncertain assumptions due to lack of knowledge. On the other hand, the primary disadvantage of using empirical relations is that they neglect the effects of several source characteristic and site-specific near-shore parameters on water height. Thus, when using empirical relations for estimating tsunami runup heights, the variability in these relations must be accounted for, as discussed later in Sect. 3.8.

In this study, new scaling relations are developed using the latest tsunami runup data made publicly available by the National Geophysical Data Center/World Data Service (NGDC/WDS). The database contains tsunami runup records from all over the world and a variety of sources including historical records, post-event surveys, and research publications. In this study, the earliest record is from the year 1586 with both earthquake magnitude and runup height available. Figure 5 shows runup measurements on the west coast of South America where the case-study site is located.

Fig. 5
figure 5

Observed tsunami runup measurements on the west coast of South America in the NGDC database

Figure 6 shows the results of a linear regression (\(h = -\, 43.62 + 6.032M_{w}\)) and log-linear regression (\(\ln h = -\, 5.25 + 0.808M_{w}\)) performed on the entire water height database in the NGDC/WDS database, with \(h\) being the runup height and \(M_{\text{w}}\) being the earthquake moment magnitude. As clearly seen, there is large variance and a wide interval of water heights, which is why linear regression with respect to the earthquake magnitude is an insufficient characterization of the scaling relationship.

Fig. 6
figure 6

Linear and log-linear regression on the entire NGDC/WDS tsunami runup database of runup heights and corresponding earthquake magnitudes plotted on a linear (left) and log-linear graphs (right)

Uncertainties pertaining to the usage of such scaling relations are related to the heterogeneity in earthquake source variables and near-shore bathymetry. In order to effectively account for this variability in scaling, we divided the earthquake events into magnitudes intervals of \(M_{\text{w}} = 0.3\), staring from \(M_{\text{w}} 6.5\) to \(M_{\text{w}} 9.5\) (Fig. 7). A lognormal distribution of runup height is fit to each interval, the parameters of which are given in Table 6. In the stochastic event set simulation, discussed in the following subsection, the tsunami height corresponding to earthquake events will be sampled from these distributions.

Fig. 7
figure 7

Histogram and distribution of tsunami runup heights for different earthquake magnitude intervals. \(N\) represents the number of measurements available in the database for the said range

Table 6 Parameters of lognormal distributions of runup heights for various earthquake magnitude intervals, obtained from NGDC/NOAA tsunami database

As presented in Table 6, few of the higher magnitude bins have a median runup height less than lower magnitude bins. However, a positive trend can be observed with increasing magnitude: \(\ln \bar{h} = -\, 2.73 + 0.486M_{\text{w}} ,\quad R^{2} = 0.815\), where \(\bar{h}\) is the median runup height in meters. After sampling the runup heights from the distributions presented in Fig. 7 corresponding to each event in the earthquake stochastic event set, the resultant tsunami hazard curve obtained is shown in Fig. 8. Only events that occur offshore in the subduction zone with a hypocentral depth less than 30 km are considered, since tsunamigenic potential of deeper events is negligible (Tinti et al. 2005; Yamashita and Sato 1974) ± 2σ.

Fig. 8
figure 8

Ensemble of local tsunami hazard curves obtained from sampling the empirical distributions (Fig. 7 and Table 6)

3.7 Multi-hazard risk analysis

Following the multi-hazard downtime estimation methodology presented in Sect. 2 and the component specific restoration functions, seismic fragility, and tsunami fragility parameters given in Sects. 3.1, 3.2, and 3.3 respectively, we can estimate the plant’s downtime for a range of combined seismic tsunamigenic events. Figure 9 (left) presents the plant’s mean downtime (Eq. 10) at various combinations of earthquake and tsunami local magnitudes. The edges of the surface at the PGA and runup axes are the single-hazard downtime curves for earthquake and tsunami, respectively. An amplification of the downtime due to the cascading effect of the two hazards is observed in Fig. 9 (right), which shows the downtime corresponding to varying levels of PGA as a function of the tsunami runup height. It is also observed in this figure that as PGA increases, the monotonic increase in the downtime with runup height decreases. This is observed due to the fact that contribution of the secondary hazard to the already high failure probability of components due to the primary hazard diminishes (see Eq. 6).

Fig. 9
figure 9

(Left) Mean downtime of the power plant for combined occurence of earthquake and tsunami for a range of PGA and tsunami runup heights. (Right) Mean downtime corresponding to increasing tsunami heights conditional on PGA magnitude

3.8 Uncertainty quantification

So far, we have assumed the basic event probabilities (i.e., component failure probabilities) to be exact, which is not a practical assumption. Some of the parameters in Tables 2, 3, and 4 are obtained from personal communications with engineers and rough estimates provided by plant management. Furthermore, the fault tree excludes external factors such as material and manpower availability, economic and political situations, and cascading events (e.g., fire, spillage) that most certainly affect the total restoration time. Lastly, the fragility and restoration parameters extracted from the literature and repositories have been derived for structures that have different structural properties, design specifications, and failure modes than the components in our case-study power plant. These uncertainties can be attributed to lack of knowledge known as ‘epistemic’ uncertainties that can be reduced in the light of additional knowledge about the component failures. In this study, a ± 40% uniform error around the nominal values of restoration and fragility parameters (Tables 2, 3, and 4) is assumed as a reasonable estimate of the epistemic uncertainty.

In addition to the epistemic uncertainty in the plant’s component parameters, uncertainty also exists in the local hazard calculation as discussed in Sects. 3.5 and 3.6. In the 10,000-year stochastic event set generated using OpenQuake, the PGA at the power plant site is drawn from a uniform distribution bounded by the minimum and maximum values predicted by the GMPE logic tree combinations. The tsunami runup heights corresponding to tsunamigenic events in the stochastic event set are sampled from the lognormal distributions given in Table 6, based on the event seismic magnitude. The ensemble of seismic and tsunami hazard curves is presented in Figs. 3 and 8, respectively.

Thus, to represent the epistemic uncertainty, a Monte Carlo simulation is completed with a total of \(N = 10,000\) realizations of the parameters that define component seismic fragility, component tsunami fragility, component restoration curves, seismic hazard, and tsunami hazard, each drawn from their respective distributions discussed above. The uncertainty propagation involves \(N\) simulations of the 10,000-year stochastic event set with nearly 23,000 events each. To ease the computational burden, events that would result in a downtime of less than one day are omitted from the simulation.

Risk is often expressed in terms of exceedance probability that gives the empirical probability that downtime exceeds a given level in 1 year. An alternative perspective is offered by the return period, which is the inverse of exceedance probability. Return period gives the time interval between which downtime is expected to exceed a given value. The process of calculating downtime exceedance probability and return period is as follows:

  1. 1.

    Sum downtime for each year in the stochastic event set to get annual downtime for 10,000 years. Ensure total downtime does not exceed 365 days;

  2. 2.

    Arrange the annual downtime in descending order;

  3. 3.

    Assign an exceedance probability of \(m/10,000\) to every \(m^{th}\) downtime value (\(m = 1, \ldots 10,000\));

  4. 4.

    Assign return period as the inverse of the exceedance probability, i.e., \(10,000 / {\text{m}}\).

Figure 10 shows the \(N = 1000\) exceedance probability curves and return period curves obtained from the Monte Carlo simulation, along with the mean curve and the two standard deviation curves.

Fig. 10
figure 10

Downtime exceedance probability and return period curves for the case-study power plant considering multiple hazards—earthquake and tsunami

Figure 11 shows the distribution of downtime corresponding to different return periods obtained from the Monte Carlo simulation. A normal distribution is fitted, although an unsymmetric distribution can be more appropriate as observed for 50-year return period. The variance of downtime estimates is generally found to increase with the return period.

Fig. 11
figure 11

Downtime distributions (normal) for different return periods

It is important to note that the business interruption loss is not directly proportional to the facility downtime as several secondary factors come into play including supply chain disruption, post-disaster business demand, availability of labor and material for restoration, employee unavailability, policy changes, etc. Businesses can have varying levels of resiliency, wherein the lost income due to disruption can be recaptured after the recovery period by ramping up production, additional worker shifts, and overtime work. In HAZUS, a recapture factor RF is introduced as Loss = (1 − RF) × (Lost Income), where RF varies with the type of occupancy of the facility, be it industrial, educational, commercial, etc., and varies from 0 to 0.98. Industries such as power plants and hotels do not have the opportunity to recover lost income since their income relies on continuous operation. For such industries, RF will be close to 0. On the other hand, manufacturing industries and retail have a greater opportunity to temporarily boost production to recover lost income, in which case RF will be closer to 1. Park et al. (Park et al. 2011) have presented a more realistic notion of recapture factor by assuming a non-static RF that begins at the HAZUS prescribed value immediately after recovery and gradually decays exponentially. Such an approach provides a clearer picture of the loss recovery path to the facility engineers and management.

4 Conclusions

We have presented a generalized multi-hazard assessment methodology for business downtime risk that is applicable to any number of cascading and concurrent hazards. As shown through a case study of a coal-fired power plant subject to both, earthquake and tsunami hazards, the proposed methodology is amenable to a system with independent and interdependent components, modeled using fault trees, Bayesian networks, and other system modeling tools. The method relies on simulation of a stochastic event set that contains random samples of multi-hazard scenarios, such as combined occurrence of earthquake and tsunami as studied in this paper. The damage states of each system component are defined based on the time required to repair the component. The restoration time is expressed probabilistically using restoration functions. Fragility functions give the probability of damage under the different hazards considered. Using Boolean combination of damage states from multiple primary hazards and secondary perils, multi-hazard fragility curves (or surfaces) are first constructed for each system component. For each event in the stochastic event set, the component failure probability is calculated and aggregated to obtain full system failure probability.

In this probabilistic treatment, the epistemic uncertainty in specifying the fragility and restoration function as well as uncertainty in the hazard is propagated to obtain an ensemble of downtime exceedance curves and the distribution of downtime corresponding to a given return period. The proposed method focuses on business interruption as a direct consequence of physical damage to the system’s components. Other aspects of business interruption, such as the dependence on external infrastructure and supply chain risk, employee access, and regional macroeconomics (Rose and Huyck 2016), are left out of scope and are active research topics.