1 Introduction

Depending on the nature of the available data, procedures for seismic vulnerability assessment can be classified into empirical, analytical, expertise-based and hybrid. Among expertise-based methods, macroseismic approaches (e.g. Lagomarsino and Giovinazzi 2006; Bernardini et al. 2008, 2011) allow to classify the vulnerability of the exposed buildings by referring to the six vulnerability classes of the EMS-98 (Grünthal et al. 1998) and considering the uncertainty in the attribution of a given building typology to vulnerability classes by means of the fuzzy set theory. The operational implementation of macroseismic approaches takes place via closed-form analytical relations, correlating seismic input and the expected damage, as a function of the assessed vulnerability (e.g. Lagomarsino et al. 2021). Seismic input is represented by macroseismic intensity, which is however a descriptive parameter resulting from the observation of the effects of an earthquake on the surrounding environment. Consequently, macroseismic intensity is affected by the characteristics, and thus by the vulnerability, of the existing building stock (e.g. Tertulliani et al. 2011; Graziani et al. 2019; Rossi et al. 2019). The identification of building typologies and their association to vulnerability classes can be limited by the regional variability of the built environment, driven by locally available construction materials, construction period and field experience gained over the centuries from the observation of damages caused by past earthquakes (e.g. Masi et al. 2021; Tocchi et al. 2021).

In Italy, the AeDES post-earthquake survey form (Baggio et al. 2007) is currently used for damage and usability assessment of ordinary buildings. Besides damage and usability information, the AeDES survey form allows for detecting metrical and typological attributes of buildings surveyed in the aftermath of an earthquake. Therefore, it can be an effective tool for supporting the definition of typological classification systems (e.g. Rota et al. 2008; Del Gaudio et al. 2017; Rosti et al. 2018) and capturing regional distinctive features of the building stock, which may affect seismic vulnerability. In this context, the interview-based CARTIS form (Zuccaro et al. 2015) can be also exploited for supporting the definition of building portfolios and vulnerability models (e.g. Polese et al. 2019, 2020; Brando et al. 2021).

Despite their methodological consistency and mathematical “elegance”, macroseismic methods are limited by the need of resorting to uncertain and approximate laws to correlate intensity values and peak ground motion parameters (e.g. Bernardini et al. 2011; Maio et al. 2015) and make these models easily usable for vulnerability and risk applications (e.g. da Porto et al. 2021; Dolce et al. 2021).

Based on the above considerations, this study proposes an innovative and comprehensive empirical vulnerability model for masonry and RC buildings representative of the Italian building stock. Given the notable amount of available damage data and the possibility of suitably accounting for the negative evidence of damage, the L’Aquila (2009) post-earthquake damage database (Dolce et al. 2019) is exploited. The proposed approach preserves the main conceptual framework at the basis of the macroseismic approach (Lagomarsino and Giovinazzi 2006), which allows for an exhaustive vulnerability classification of the building stock by referring to vulnerability classes and by considering the uncertain association of building typologies to vulnerability classes. Novelties of this work are the adoption of the peak ground acceleration as physical parameter for the characterisation of ground motion severity, which makes the application of the proposed vulnerability model handy from the engineering perspective, and the use of unsupervised machine learning techniques for removing the subjectivity in the definition of vulnerability classes, by clustering seismic damages observed on the Italian built environment. The use of the AeDES survey form for the L’Aquila (2009) post-earthquake building inspections also allows for an enhanced definition of building typologies representative of the Italian building stock.

This paper is structured as follows. The adopted post-earthquake damage dataset is first described (Sect. 2), together with the main assumptions and interpretations, regarding the characterisation of the ground motion severity experienced at each building location, the classification of the observed seismic damage and the adopted building taxonomy. Typological fragility and mean damage curves are derived, as a function of the peak ground acceleration, by means of a suitable statistical model and fitting procedure. Empirically-derived mean damage data are then partitioned into a predefined number of clusters based on a data-driven approach. In this context, a soft clustering technique is used, permitting each data point to belong to multiple clusters with different membership degree (Sect. 3). This strategy allows for an objective identification of vulnerability classes of decreasing vulnerability, for which fragility functions are subsequently derived. A total of ten vulnerability classes (i.e. A1, B1, C1, C2, D1, D2, E1, E2, F1 and F2) is considered, six of which refer to masonry buildings (i.e. A1, B1, C1, D1, E1 and F1) and four to RC buildings (i.e. C2, D2, E2 and F2). The expedient of distinguishing masonry from RC vulnerability classes arises from the different distance among damage states emerged from typological fragility functions.

A probabilistic framework is then set up allowing for the attribution of a given building typology to multiple vulnerability classes, based on an ad-hoc strategy, involving the use of probability theory and using empirically-derived typological fragility functions as a target (Sect. 4). Similarly to the EMS-98, a vulnerability table is proposed to differentiate the seismic vulnerability of the exposed built environment based on selected building attributes. In this context, the synthetic binomial parameter, representing the weighted mean vulnerability class, is indicated for each considered building typology, allowing for easily categorising the seismic vulnerability of masonry and RC buildings depending on their typological features. The feasibility of the proposed vulnerability model is then demonstrated by a case study application with reference to two selected building typologies (Sect. 5).

Final remarks and conclusions of this work are discussed in Sect. 6. The results obtained in the different methodological phases of this study, including the parameters of the fragility functions derived for both building typologies and vulnerability classes, the degrees of belonging of building typologies to vulnerability classes and the proposed vulnerability table, can be used in varied applications in the field of seismic vulnerability and risk, provided the similarity of seismic hazard and built environment.

2 Post-earthquake damage database

This work takes advantage of a robust post-earthquake database, gathering damage data of residential buildings hit by the 2009 L’Aquila seismic event (Dolce et al. 2019). Although the availability of several post-earthquake damage databases, the L’Aquila database represents a good candidate for different aspects, among which the significant number of inspected buildings and the considerable number of completely-surveyed municipalities (Fig. 1), identified by a completeness ratio (i.e. number of surveyed residential buildings over the total number of buildings evaluated from national building census, ISTAT 2001) higher than 90% (Rosti et al. 2021a, b). The possibility of accounting for the negative evidence of damage in the municipalities less affected by the ground shaking is a further advantage related to the use of this post-earthquake damage database. In this context, non-surveyed residential buildings, sited in 176 non-surveyed and 49 partially-surveyed (completeness ratio < 10%) municipalities of the Abruzzi region (Fig. 1), are reasonably assumed to be undamaged and used to integrate the post-earthquake damage database (Rosti et al. 2021a, b). Following these operations, the considered post-earthquake dataset includes damage data of 37′406 residential (masonry: 28′713 and RC: 8′693) buildings. Non-inspected residential buildings, from non-surveyed and partially-surveyed municipalities and assumed undamaged, are 175′152 (masonry) and 22′376 (RC).

Fig. 1
figure 1

Identification of the Abruzzi surveyed and non-surveyed municipalities. Colours denote the completeness ratio of each municipality

2.1 Ground motion characterisation

Consistently with the aim of this study, the ground motion severity experienced at each building location is quantified by PGA (e.g. Rosti et al. 2020a), extrapolated from updated INGV shakemap (Michelini et al. 2020). With respect to Michelini et al. (2008), the new shakemap configuration involves the use of recently developed ground motion models, selected based on a ranking procedure, an updated Vs30 map for local site effects and the adoption of the newly developed USGS-ShakeMap version 4 (v.4) software (Worden et al. 2020). Also, the new shakemap defines isoseismic units of 0.02 g, allowing for a rather accurate seismic input characterisation of buildings subsequently involved in the fragility assessment (Fig. 2).

Fig. 2
figure 2

INGV PGA shakemap of the 2009 L’Aquila seismic event (Michelini et al. 2020)

2.2 Classification of the observed seismic damage

Starting from the damage description provided by the post-earthquake survey form, grading the seismic damage observed on different building components based on both damage severity and extent, a unique global level of damage is assigned to each inspected building. Damage descriptions available from the post-earthquake survey form are mapped to the discrete damage levels of the EMS-98 using the damage conversion rules by Rota et al. (2008) and Del Gaudio et al. (2017) in case of structural and non-structural (i.e. masonry infills/partitions) damage, respectively. Global damage levels of masonry buildings are then given by the maximum level of damage detected on the vertical structure, intermediate diaphragms and roof (e.g. Rota et al. 2008; Rosti et al. 2018). In case of RC buildings, global damage levels are instead given by the maximum level of damage observed on the vertical structure and masonry infills/partitions (e.g. Del Gaudio et al. 2017; Rosti et al. 2018).

Figure 3 shows the resulting damage distributions of residential masonry (a) and RC (b) buildings, with reference to the completely-surveyed municipalities.

Fig. 3
figure 3

Damage classification of the residential masonry (a) and RC (b) building stock for L’Aquila completely-surveyed municipalities

2.3 Typological classification of the residential building stock

The typological classification of the residential building stock accounts for the main building attributes retrievable from the AeDES post-earthquake survey form and affecting the buildings’ seismic behaviour. A first distinction is made based on the construction material (i.e. masonry/RC) and the number of storeys (i.e. 1, 2, 3 and ≥ 4 storeys for masonry buildings and 1, 2, 3, 4, ≥ 5 storeys for RC buildings). Masonry buildings are further classified based on the texture and quality of the masonry fabric (i.e. IRR: irregular layout or poor-quality; REG: regular layout and good-quality), in-plane flexibility of intermediate floor diaphragms (i.e. F: flexible; R: rigid) and presence (or lack) of connecting devices, such as tie-rods and/or tie-beams (i.e. CD: with connecting devices; NCD: without connecting devices), similarly to Rota et al. (2008).

Besides the building height, the typological classification of the RC building stock accounts for the level of seismic design (i.e. buildings seismically designed pre-1981 and post-1981, being 1981 a key date for the enforcement of relatively modern seismic design rules). Most of the municipalities in the L’Aquila region were classified as seismic prone in early twentieth century, hence RC buildings designed to gravity loads (and wind loads) only are basically missing in the dataset.

The considered building taxonomy leads to the identification of a total of 42 building typologies, 32 of which refer to masonry buildings and 10 to RC buildings. The higher level of detail of the adopted typological classification system is aimed at detecting possible differences or similarities in the observed seismic vulnerability of the existing building stock, given the presence (or absence) of specific building attributes or constructive details.

Masonry and RC buildings represent 77% and 23%, respectively, of the considered post-earthquake damage database. Irregular layout or poor-quality masonry constitutes about 68% of the considered masonry buildings, 32% of which are instead characterised by good-quality materials with regular layout. About 70% of the masonry buildings have flexible intermediate diaphragms, whereas 30% of the horizontal structures are rigid. Aseismic devices are present in 41% of the considered masonry buildings. Focusing on RC constructions, 36% and 64% are seismically designed pre- and post-1981, respectively.

Referring to the completely-surveyed municipalities, Fig. 4a shows the subdivision of the existing building stock based on the construction material. Figure 4 subdivides masonry (b) and RC buildings (c) based on the number of storeys. In Fig. 4d, e, f masonry buildings are classified based on the masonry type, in-plane stiffness of the intermediate diaphragms, presence (or lack) of aseismic devices (i.e. tie-rods/tie-beams).

Fig. 4
figure 4

Typological classification of the residential building stock for L’Aquila completely-surveyed municipalities. Construction material (a); subdivision of masonry (b) and RC (c) buildings based on the number of storeys; subdivision of masonry buildings based on the masonry type (d), in-plane stiffness of intermediate diaphragms (e), presence or absence of aseismic devices (f)

The post-earthquake damage database is then enlarged by undamaged buildings from non-surveyed and partially-surveyed municipalities. As the building attributes considered by the national building census are construction material, construction age and number of storeys, the integration of undamaged RC buildings is straightforward. Conversely, mapping of undamaged masonry buildings to the predefined building typologies is carried out based on the typological composition of the masonry macro-categories, identified based on the age of construction (i.e. < 1919, 1919–45, 1946–61, 1962–71, 1972–81, 1982–91 and > 1991) and number of storeys (1, 2, 3, ≥ 4), as depicted in Fig. 5. Frequency values reported in Fig. 5, obtained by classifying residential masonry buildings of the considered post-earthquake damage dataset based on typological and census building attributes, both available from the AeDES survey form, are then applied to the undamaged building stock, for which only census-based information is available.

Fig. 5
figure 5

Typological composition of the masonry macro-categories identified based on census building attributes

Table 1 reports the building typologies identified based on the adopted building taxonomy, together with the indication of the sample size, including both damaged and undamaged buildings.

Table 1 Identification of building typologies based on the adopted building taxonomy. Numbers into brackets indicate the sample size

2.4 Seismic fragility assessment

Empirical fragility curves are derived for quantifying the seismic vulnerability of predefined building typologies, as a function of the ground motion severity. The cumulative lognormal distribution (e.g. Rossetto and Elnashai 2003; Rota et al. 2008; Del Gaudio et al. 2017; Ader et al. 2020; Rosti et al. 2020b) is employed for describing the probability of reaching or exceeding a given level of damage, P(ds ≥ DSi|PGA), as a function of PGA:

$$P\left( {\left. {ds \ge DS_{i} } \right|PGA} \right) = \Phi \left[ {\frac{{\log \left( {PGA/\theta_{DSi} } \right)}}{\beta }} \right]$$
(1)

where θDSi is the median PGA value associated with damage level DSi and β is the logarithmic standard deviation.

The buildings’ subdivision in the different damage states, nij, given the jth PGA threshold, is approximated by the multinomial distribution (e.g. Agresti 2002; Charvet et al. 2014; Ioannou et al. 2021; Rosti et al. 2021a, b):

$$n_{ij} \sim \mathop \prod \limits_{i = 0}^{nDS} \frac{{N_{j} !}}{{n_{ij} !}}P\left( {ds = DS_{i} {|}PGA_{j} } \right)^{{n_{ij} }}$$
(2)

where Nj is the total number of buildings at the jth PGA threshold and P(ds = DSi|PGAj) is the conditional probability of occurrence of damage level DSi, defined as:

$$P\left( {ds = DS_{i} {|}PGA_{j} } \right) = \left\{ {\begin{array}{*{20}l} {1 - P\left( {ds \ge DS_{i + 1} {|}PGA_{j} } \right)} \hfill & {i = 0} \hfill \\ {P\left( {ds \ge DS_{i} {|}PGA_{j} } \right) - P\left( {ds \ge DS_{i + 1} {|}PGA_{j} } \right)} \hfill & {0 < i < nDS} \hfill \\ {P\left( {ds \ge DS_{i} {|}PGA_{j} } \right)} \hfill & {i = nDS} \hfill \\ \end{array} } \right.$$
(3)

A common value of β is assumed to guarantee the ordinal nature of damage and preventing intersecting fragility functions (e.g. Lallemant et al. 2015; Ader et al. 2020; Rosti et al. 2021a, b). The same value of dispersion (β) is also enforced to all building typologies (e.g. Coburn and Spence 2002; Karababa and Pomonis 2011). These conditions are met by simultaneously fitting the fragility curves on all damage levels and building typologies via the maximum likelihood estimate (MLE) approach:

$$\left( {{\varvec{\theta}},\beta } \right) = \arg {\text{max}}[\log \left( {L\left( {{\varvec{\theta}},\beta } \right)} \right] = \arg \max \left[ {{\text{log}}\left( {\mathop \prod \limits_{k = 1}^{nTyp} \mathop \prod \limits_{j = 1}^{nPGA} \mathop \prod \limits_{i = 0}^{nDS} \frac{{N_{jk} !}}{{n_{ijk} !}}P\left( {ds = DS_{i} {|}PGA_{j} ,Typ_{k} } \right)^{{n_{ijk} }} } \right)} \right]$$
(4)

where nTyp is the number of building typologies, nPGA is the number of PGA thresholds, nDS is the number of damage levels, Njk is the total number of buildings of the kth building typology at the jth PGA threshold, nijk is the number of buildings of the kth building typology with damage level DSi at the jth PGA threshold.

For each building typology, Table 2 summarises the parameters of the resulting lognormal fragility curves (i.e. θDSi, median PGA values and β, logarithmic standard deviation).

Table 2 Parameters (i.e. median and logarithmic standard deviation) of the typological lognormal fragility curves

In line with existing studies (e.g. Braga et al. 1982; Dolce et al. 2003; Lagomarsino and Giovinazzi 2006), the mean level of damage, μD, attained at a given PGA threshold is defined as:

$$\mu_{D} \left( {PGA_{j} } \right) = \mathop \sum \limits_{i = 0}^{nDS} i \cdot P\left( {ds = DS_{i} {|}PGA_{j} } \right)$$
(5)

where nDS is the number of damage states and P(ds = DSi|PGAj) is the probability of occurrence of the ith damage level, obtained from the previously determined fragility functions.

3 Identification of vulnerability classes and derivation of fragility curves via unsupervised machine learning techniques

The definition of vulnerability classes has been extensively addressed in the literature (e.g. Goretti and Di Pasquale 2004; Di Pasquale et al. 2005; Dolce et al. 2006; Dolce and Goretti 2015; Masi et al. 2021; Rosti et al. 2021a; Saretta et al. 2021). Most of those studies (e.g. Goretti and Di Pasquale 2004; Di Pasquale et al. 2005; Dolce et al. 2006; Dolce and Goretti 2015) resort to the association rule between structural typologies and vulnerability classes proposed by Braga et al. (1982), resulting from the best agreement between the Irpinia (1980) empirical damage data and the MSK scale. To remove possible subjectivity in the definition of vulnerability classes, Rosti et al. (2021a) employed a hierarchical agglomerative clustering algorithm for objectively merging predefined building typologies into three vulnerability classes of decreasing vulnerability. However, the methodological apparatus was not set into a probabilistic framework, as a univocal correspondence between structural typologies and vulnerability classes was proposed.

Aware of the advantage of removing subjectivity in the attribution of some choices and requiring a limited intervention of the expert in the definition of a suitable number of clusters in which partitioning the dataset, unsupervised machine learning algorithms have been used in disparate applications of earthquake engineering and seismology (e.g. Weatherhill and Burton 2009; Jayaram and Baker 2010; Rehman et al. 2014; Kotha et al. 2018; Mascandola et al. 2020; Xie et al. 2020). Clustering is one of the most common unsupervised machine learning algorithms, employed for drawing inferences from datasets of input data without labelled responses, based on some measures of similarity. In this context, K-means (Lloyd 1982) is an iterative algorithm, splitting data into a predefined number of mutually exclusive clusters. In other words, each observation is assigned to exactly one of the clusters, by minimising the distance between the data point and the centroid of the assigned cluster.

Fuzzy c-means (FCM) clustering (Bezdek 1981) is instead a soft version of K-means, where each data point has a fuzzy degree of belonging (uij) to each cluster. FCM is based on minimising the objective function, Jm:

$$J_{m} = \mathop \sum \limits_{i = 1}^{D} \mathop \sum \limits_{j = 1}^{N} u_{ij}^{m}\parallel x_{i} - c_{j}\parallel^{2}$$
(6)

where D is the total number of data points, N is the number of clusters, uij is the membership degree of xi to the jth cluster, xi is the ith data point, m controls the fuzzy overlapping among different clusters and cj is the centroid of the jth cluster.

The algorithm proceeds as follows:

  1. 1.

    The uij values are randomly initialized

  2. 2.

    The centroids, cj, of the clusters are computed as:

    $$c_{j} = \frac{{\mathop \sum \nolimits_{i = 1}^{D} u_{ij}^{m} x_{i} }}{{\mathop \sum \nolimits_{i = 1}^{D} u_{ij}^{m} }}$$
    (7)
  3. 3.

    The uij values are updated:

    $$u_{ij} = \frac{1}{{\mathop \sum \nolimits_{k = 1}^{N} \left( {\frac{\parallel{x_{i} - c_{j} }\parallel}{\parallel{x_{i} - c_{k} }\parallel}} \right)^{{\frac{2}{m - 1}}} }}$$
    (8)
  4. 4.

    The objective function, Jm, is computed

  5. 5.

    Steps 2–4 are repeated until Jm improves by less than a specified minimum threshold or until after a specified maximum number of iterations.

In this study, fuzzy c-means clustering is employed to split mean damage values, observed at preselected ground motion intensity levels, into a predefined number of clusters, representing the vulnerability classes. Differently from K-means clustering, the use of the FCM algorithm allows each observational mean damage data point to belong to multiple vulnerability classes with different membership degree.

According to the number of vulnerability classes considered by the EMS-98 (Grünthal et al. 1998), six vulnerability classes of decreasing vulnerability (from A to F) are considered, each split into two subgroups based on the construction material (i.e. masonry and RC). The reason of separating masonry from RC vulnerability classes derives from the different distance among damage levels observed in the fragility functions obtained for the two families of structural typologies (Table 2). Six vulnerability classes are considered for masonry (i.e. A1, B1, C1, D1, E1, F1) whereas four out of six vulnerability classes are introduced for RC buildings (i.e. C2, D2, E2, F2), for which higher vulnerable vulnerability classes (i.e. A2 and B2) lack. Vulnerability classes A2 and B2 could indeed refer to RC buildings without seismic design (i.e. RC buildings designed for gravity loads only), which are not available from the L’Aquila post-earthquake dataset, or to other more vulnerable RC building typologies not contemplated by the adopted building taxonomy.

Based on the above considerations, FCM clustering is separately applied to empirical mean damage values associated with masonry and RC building typologies. An overlapping coefficient, m, equal to 2 is considered, as higher values imply fuzzier boundaries among the different vulnerability classes, leading to less distinct fragility curves.

Considering the masonry dataset, the outermost vulnerability classes A1 and F1 are first defined, by respectively pooling together mean damage values of the two most vulnerable (i.e. IRR-F-NCD-4 + and IRR-F-CD-4 +) and less vulnerable (i.e. REG-F-CD-1 and REG-R-CD-1) building typologies. The most and least vulnerable building typologies are identified by comparing observational typological mean damage values and then considering the two building typologies with higher and lower mean damage values, respectively. The remaining typological mean damage values are then split into four clusters (i.e. B1, C1, D1, E1) via FCM clustering. In this way, the presence of building typologies more and less vulnerable than the outermost vulnerability classes (i.e. A1 and F1) is avoided. A similar strategy is applied for the definition of the vulnerability classes for RC buildings. In this case, the number of clusters in which partitioning the RC mean damage dataset is set equal to two, allowing for the identification of the inner RC vulnerability classes (i.e. D2 and E2). The outermost vulnerability classes (i.e. C2 and F2) are instead obtained by considering empirical mean damage values of the most (i.e. RC-Seismic-Pre81-5 +) and least (i.e. RC-Seismic-Post81-1) vulnerable RC building typologies.

The implementation of FCM clustering provides, for a given PGA threshold, the membership degree of each mean damage value to different vulnerability classes. The highest membership degree denotes the most probable vulnerability class. As a result, empirical mean damage data points (Fig. 6a) are attributed to the most likely vulnerability class (i.e. the one with the higher membership degree) and to the other vulnerability classes, with different membership degree (Fig. 6b).

Fig. 6
figure 6

Empirically-derived mean damage values of masonry and RC building typologies versus PGA (a) and identification of the most probable vulnerability class (i.e. higher membership degree) based on FCM clustering algorithm (b)

By interpreting the membership degree of a given mean damage value to the different vulnerability classes in terms of frequency, the number of buildings within a given building typology belonging to each vulnerability class is obtained, for each PGA threshold. Repetition of this procedure for all mean damage values and PGA thresholds leads to the definition of damage probability matrices of vulnerability classes, then approximated by fitting fragility functions (Fig. 7). Median PGA values associated with the different damage levels are obtained via the maximum likelihood approach, by imposing the same dispersion value (β) resulting from the typological fragility curves (Sect. 2.4). Table 3 collects the parameters (i.e. θDSi, median PGA values and β, logarithmic standard deviation) of the fragility curves of the ten vulnerability classes. For each level of damage, resulting fragility curves are compared in Fig. 8, showing a clear hierarchy among the different vulnerability classes.

Fig. 7
figure 7

Fragility curves of the ten vulnerability classes, resulting from FCM clustering of the observed mean damage values of Masonry (1) and RC (2) building typologies. Numbers in the legend indicate the sample size

Table 3 Parameters (i.e. median and logarithmic standard deviation) of the lognormal fragility curves of the ten vulnerability classes, resulting from FCM clustering of the observed mean damage values
Fig. 8
figure 8

Comparison of the fragility curves of the vulnerability classes, resulting from FCM clustering of the observed mean damage values

4 Proposed vulnerability model

Besides fragility functions, providing the expected damage distribution in the different levels as a function of the ground motion severity, a thorough vulnerability model should supply indications on the vulnerability classification of the exposed building stock. In this context, an ad-hoc strategy is built up to determine the degree of belonging of each building typology to multiple vulnerability classes. Fragility functions derived for vulnerability classes are linearly combined and optimal coefficients of the linear combination, representing the degrees of belonging of the selected building typology to vulnerability classes, are obtained by using typological fragility curves as a target.

Based on this procedure, the fragility curve associated with the ith damage level of the jth building typology can be approximated as:

$$\Phi \left[ {\frac{{\log \left( {PGA/\theta_{DSij} } \right)}}{\beta }} \right] \approx \mathop \sum \limits_{k = 0}^{NClasses - 1} w_{jk} \Phi \left[ {\frac{{\log \left( {PGA/\theta_{DSik} } \right)}}{\beta }} \right]$$
(9)

where θDSij is the median PGA value of the fragility function of the ith damage level of the jth building typology, NClasses indicates the total number of the considered vulnerability classes, wjk denotes the degree of belonging of the jth building typology to the kth vulnerability class, θDSik is the median PGA value of the fragility function of the ith damage level of the kth vulnerability class and β is the logarithmic standard deviation, which is constant among damage levels, building typologies and vulnerability classes.

The definition of the linear combination coefficients, wjk, results from an optimisation problem, minimising the global deviation between the sets of approximating and empirically-derived typological fragility curves. As the optimisation problem aims at providing the coefficients of the linear combination better reproducing target fragility functions, optimal coefficients may refer to non-adjacent vulnerability classes. This issue is counteracted by introducing a probability distribution for describing the trend of the wjk values. Considering the advantage of being fully described by a single parameter, the binomial model is selected. As two sets of different vulnerability classes are defined for masonry and RC typologies, two binomial distributions are introduced. One binomial distribution refers to masonry vulnerability classes and is fully described by the binomial parameter, ymas, with kmas ranging from 0 (F1) to 5 (A1), as indicated in Eq. (10). The other one refers to RC vulnerability classes and is fully determined by the binomial parameter, yRC, with kRC varying from 0 (F2) to 3 (C2), as per Eq. (11). The degree of belonging of the jth building typology to the kth vulnerability class, wjk, is then expressed by jointly using the binomial distributions defined in Eqs. (10) and  (11), suitably scaled by the factor cmas (Eq. (12)). The scaling coefficient, cmas, which is unknown together with the binomial parameters ymas and yRC, indicates the weight that the binomial distribution of “masonry” vulnerability classes takes in the global distribution of the wjk coefficients. To ensure that the wjk coefficients add up to 1, the “RC” binomial distribution is scaled by the complementary to 1 of the cmas coefficient (Eq. (12)).

$$w_{jk,mas} = \frac{5!}{{k_{mas} !\left( {5 - k_{mas} } \right)!}} \left( {\frac{{y_{j,mas} }}{5}} \right)^{{k_{mas} }} \left( {1 - \frac{{y_{j,mas} }}{5}} \right)^{{5 - k_{mas} }}$$
(10)
$$w_{jk,RC} = \frac{3!}{{k_{RC} !\left( {3 - k_{RC} } \right)!}} \left( {\frac{{y_{j,RC} }}{3}} \right)^{{k_{RC} }} \left( {1 - \frac{{y_{j,RC} }}{3}} \right)^{{3 - k_{RC} }}$$
(11)
$$w_{jk} = c_{j,mas} w_{jk,mas} + \left( {1 - c_{j,mas} } \right)w_{jk,RC}$$
(12)

The joint use of two binomial models for describing the wjk distribution allows for more flexibility. Besides the cases where masonry (cmas = 1) and RC (cmas = 0) vulnerability classes are used only, the combined use of masonry and RC vulnerability classes can be convenient for some building typologies, to suitably account for a different distance among the fragility curves of the different damage states.

By substituting Eqs. (10) and (11) into Eq. (12) and then replacing Eq. (12) into Eq. (9), the fragility curve of damage level DSi of the jth building typology can be approximated as:

$$\begin{aligned} \Phi \left[ {\frac{{\log \left( {PGA/\theta_{DSij} } \right)}}{\beta }} \right] \approx & c_{j,mas} \mathop \sum \limits_{{k_{mas} = 0}}^{5} \left[ {\frac{5!}{{k_{mas} !\left( {5 - k_{mas} } \right)!}} \left( {\frac{{y_{j,mas} }}{5}} \right)^{{k_{mas} }} \left( {1 - \frac{{y_{j,mas} }}{5}} \right)^{{5 - k_{mas} }} } \right]\Phi \left[ {\frac{{\log \left( {PGA/\theta_{{DSik_{mas} }} } \right)}}{\beta }} \right] \\ & + \left( {1 - c_{j,mas} } \right)\mathop \sum \limits_{{k_{RC} = 0}}^{3} \left[ {\frac{3!}{{k_{RC} !\left( {3 - k_{RC} } \right)!}} \left( {\frac{{y_{j,RC} }}{3}} \right)^{{k_{RC} }} \left( {1 - \frac{{y_{j,RC} }}{3}} \right)^{{3 - k_{RC} }} } \right]\Phi \left[ {\frac{{\log \left( {PGA/\theta_{{DSik_{RC} }} } \right)}}{\beta }} \right] \\ \end{aligned}$$
(13)

For each building typology, the unknowns to be determined are three, i.e. the parameters of the binomial distribution defined for masonry, ymas, and RC, yRC, vulnerability classes and the fraction of the masonry binomial distribution, cmas, to be considered within the global wjk distribution. Optimal values of the parameters ymas, yRC and cmas result from a constrained optimisation problem minimising the global deviation between the sets of approximating and target typological fragility functions. Parameters ymas, yRC and cmas are constrained between 0 and 5, 0 and 3 and 0 and 1, respectively.

Table 4 collects the parameters cmas, ymas, yRC which allow for defining the degrees of belonging of each building typology to vulnerability classes. Figures 9, 10 and 11 show the distribution of the degrees of belonging of building typologies to vulnerability classes. Figures also compare the approximating fragility curves (continuous lines) resulting from linearly combining the fragility functions of the different vulnerability classes, suitably accounting for their degree of belonging, with the empirically-derived typological fragility curves (dotted lines). Vertical dashed lines represent the weighted mean vulnerability class of the binomial distribution.

Table 4 Parameters required for defining the degrees of belonging of each building typology to vulnerability classes
Fig. 9
figure 9

Degrees of belonging of irregular layout or poor-quality masonry building typologies to vulnerability classes

Fig. 10
figure 10

Degrees of belonging of regular layout and good-quality masonry building typologies to vulnerability classes

Fig. 11
figure 11

Degrees of belonging of RC building typologies to vulnerability classes

Fragility curves derived from the linear combination of the fragility functions of the vulnerability classes and accounting for the membership degree of the building typology to the vulnerability classes (continuous lines) generally well approximate the empirically-derived typological fragility curves (dotted lines), suggesting the suitability of the adopted strategy (Figs. 9, 10 and 11). In case of some masonry building typologies (i.e. IRR-R-CD-3/4 + , REG-F-NCD-2/3/4 + , REG-F-CD-2/3/4 + , REG-R-NCD/CD-4 +), the combined use of vulnerability classes defined for both masonry and RC buildings allows for better reproducing target typological fragility functions.

In line with the EMS-98, Fig. 12 provides the vulnerability classification of the existing building stock, resulting from the adopted procedure. The proposed vulnerability table supplies a synthetic vulnerability representation of each building typology, in terms of the weighted mean vulnerability class of the binomial distribution (squared markers), fully describing the degree of belonging of a given typology to multiple vulnerability classes. In the figure, black squared markers indicate the weighted mean “masonry” vulnerability class, whereas white squared markers denote the weighted mean “RC” vulnerability class. Bars, associated with each building typology, indicate the fraction of “masonry”/“RC” binomial distributions to be considered within the global wjk distribution. Grey solid hatch provides the value of the cmas coefficient. In other words, fully grey bars correspond to the case of cmas = 1 (i.e. only “masonry” vulnerability classes are considered), totally empty bars refer to the case of cmas = 0 (i.e. only “RC” vulnerability classes are accounted for) whereas partially-filled bars correspond to the case of 0 < cmas < 1 (i.e. both “masonry” and “RC” vulnerability classes are considered). The proposed vulnerability table also permits to categorise and compare the seismic vulnerability of buildings based on the presence or absence of specific typological building attributes.

Fig. 12
figure 12

Proposed vulnerability table for masonry and RC building typologies. Black and white squared markers indicate the weighted mean “masonry” and “RC” vulnerability classes, respectively, fully characterising the distribution of the degrees of belonging to vulnerability classes. Grey solid hatch provides the value of the cmas coefficient

Results show that the layout and quality of the masonry fabric significantly affect the seismic vulnerability of masonry buildings, being irregular layout or poor-quality masonry building typologies more vulnerable than those made of good-quality materials with regular texture. Besides the characteristics of the masonry fabric, other building attributes, such as the in-plane stiffness of intermediate diaphragms and the presence of aseismic devices, impact the seismic vulnerability of masonry buildings. The presence of rigid diaphragms indeed improves the seismic behaviour of masonry constructions with respect to those with flexible horizontal structures. Also, the presence of connecting devices (i.e. tie-rods and/or tie-beams) has a beneficial effect on the seismic vulnerability of masonry buildings. Building code evolution improves the seismic response of RC buildings. Given the building height, RC buildings seismically-designed after 1981 are indeed less vulnerable than those seismically-designed according to obsolete (pre-1981) seismic prescriptions. The number of storeys strongly impacts the seismic vulnerability of both masonry and RC buildings, suggesting the need and the importance of accounting for building height in the vulnerability classification of the existing building stock.

5 Example of application

To better clarify the steps involved in the application of the vulnerability model, let us consider two typologies of 3-storeys masonry buildings, both subjected to a PGA of 0.20 g:

  • The first typology is made of undressed stone masonry, timber floor diaphragms, without suitable connections like steel ties or RC ring beams (i.e. belonging to the IRR-F-NCD-3 typology);

  • The second one is made of clay brick masonry, timber floors and steel tie-rods (i.e. REG-F-CD-3 typology).

The probability of exceeding the five damage states for a PGA equal to 0.2 g for the different vulnerability classes is reported in Table 5. It is simply obtained using the lognormal model with parameters set in Table 3.

Table 5 Probability of exceeding the 5 damage states for the 10 vulnerability classes at PGA = 0.2 g

Table 4 provides for the first typology cmas = 1 and ymas = 4.477, whereas for the second one cmas = 0.543, ymas = 2.040 and yRC = 1.807. The resulting weights resulting from the application of Eq. (12) are reported in Table 6.

Table 6 Weights for the combination of the fragility curves associated with the different vulnerability classes for the two building typologies considered in this sample application

The weighted combination of the probabilities reported in Table 5 provides the following probabilities of exceeding the five damage states:

  • PDS1 = 0.83, PDS2 = 0.62, PDS3 = 0.51, PDS4 = 0.34, and PDS5 = 0.11 for IRR-F-NCD-3;

  • PDS1 = 0.51, PDS2 = 0.25, PDS3 = 0.16, PDS4 = 0.08, and PDS5 = 0.03 for REG-F-CD-3.

The expected distribution of the buildings belonging to the two considered building types among the different damage states can be immediately deduced from the probabilities of exceedance. The results reported in Fig. 13 show the significant difference in vulnerability between these masonry building typologies subject to the same level of seismic shaking (PGA = 0.2 g).

Fig. 13
figure 13

Expected damage distribution for buildings belonging to IRR-F-NCD-3 (a) and REG-F-CD-3 (b) structural typologies subject to PGA = 0.2 g

6 Conclusions

This paper presents a novel and comprehensive vulnerability model for masonry and RC buildings, representative of the Italian built environment, relying on a data-driven approach. The proposed vulnerability model is based on the definition of classes characterised by fragility curves in PGA and on the relationship of belonging to the different vulnerability classes of structural types identified by essential attributes such as vertical and horizontal structure, structural details, level of seismic design and number of stories.

With respect to previous studies, peak ground acceleration, extrapolated from updated INGV shakemaps (Michelini et al. 2020), is employed for the seismic input definition, making the present model easily usable for vulnerability and risk applications. A robust post-earthquake damage database (Dolce et al. 2019), collected after the L’Aquila (2009) seismic event, is employed, allowing for suitable consideration of the completeness of the post-earthquake field surveys and for the negative evidence of damage in the municipalities less affected by the ground shaking. Attention is first devoted to data processing, involving seismic input characterisation, typological classification of the exposed building stock and definition of a damage metric based on the EMS-98 damage states. A suitable statistical model and fitting technique are then employed for deriving typological fragility curves and mean damage values, as a function of the selected intensity measure.

One of the original contributions of this work is the use of machine learning techniques for the objective identification of ten vulnerability classes, starting from seismic damages observed on several Italian building typologies. Soft clustering is applied to empirically-derived mean damage values of the different building typologies, at preselected PGA thresholds. Damage distributions are hence obtained for each vulnerability class and then fitted by fragility functions. The adoption of a constant dispersion value for all damage levels and vulnerability classes ensures the hierarchy among the different damage levels, given the vulnerability class, and among the different vulnerability classes, which appear distinct and separated one from the other. In this study, vulnerability classes derived from masonry and RC building data are distinguished to account for the different distance among damage levels observed in the empirically-derived typological fragility functions.

In line with the conceptual framework of the macroseismic method (Lagomarsino and Giovinazzi 2006), accounting for the uncertainty in the attribution of building types to vulnerability classes, the degrees of belonging of each building typology to multiple vulnerability classes are determined. To this aim, a constrained optimisation problem, using empirically-derived typological fragility curves as a target, is set up. For a given building typology, the weighted mean vulnerability class is provided, allowing to fully characterise the distribution of the degrees of belonging to vulnerability classes, under the assumption of binomial distribution. Comparison of approximating and empirically-derived fragility functions of the considered building typologies shows the appropriateness of the adopted strategy, which results in a thorough vulnerability model consistently defined for both masonry and RC buildings.

Similarly to the EMS-98, a vulnerability table provides parameters to model the uncertain association of buildings belonging to typologies identified by selected structural features with the different seismic vulnerability classes. In this context, the availability of a robust post-earthquake database gathering both typological and damage information (Dolce et al. 2019) allows for improving the definition of structural types typical of the Italian building stock.

Comparison of the weighted mean vulnerability class of the different building typologies points out the higher vulnerability of irregular layout or poor-quality masonry buildings with respect to those with regular layout and good-quality materials, in line with observations from post-earthquake field surveys (e.g. Saatcioglu and Bruneau 1993; Penna et al. 2014; Sorrentino et al. 2019). The presence of rigid horizontal structures as well as the presence of aseismic devices (e.g. appropriate wall-to-wall and wall-to-diaphragm connections) enhances the seismic response of masonry buildings. The level of seismic design has a clear beneficial effect on the seismic vulnerability of RC constructions, being RC buildings seismically-designed before 1981 more vulnerable that the corresponding ones complying with updated seismic design criteria. In line with other studies (e.g. Rota et al. 2011), the obtained results point out the role of the number of storeys on the empirical seismic vulnerability of both masonry and RC buildings, also suggesting consideration of this parameter as a possible future improvement of existing macroseismic scales.

Results provided in this paper (i.e. parameters of fragility curves derived for building typologies and for vulnerability classes, degrees of belonging of building typologies to vulnerability classes and the proposed vulnerability table) can be used for varied seismic vulnerability and risk applications, provided the similarity in the seismic hazard and exposed building stock of the area selected for application.

The availability of new damage data from post-earthquake observations will allow to further strengthen the vulnerability model and to verify or adapt its applicability to different territories and built environments. In particular, the integration of data on RC buildings without seismic design would allow completing the model towards the classes and types of greater vulnerability for which, in the current version, classes A2 and B2 have been envisaged.