1 Introduction

Situated at the contact of the SW corner of the Pannonian basin and NE part of Dinarides, the wider Zagreb area is considered to be the most tectonically active region of the continental part of Croatia. Its seismicity can be characterized as moderate with rare occurrences of strong events (Herak et al. 2009).

The area encompasses the north-western part of the Sava River basin where highly deformed pre-Neogene units emerge out of 1600–2500 m thick Neogene-Quaternary sedimentary sequence (Tomljenović and Csontos 2001). The terrain configuration and the underlying sediments units make the region prone to amplification effects of seismic waves and liquefaction phenomena which can have devastating impacts on buildings and infrastructure (Torbar 1882). Moreover, Zagreb, the capital of Croatia, is the largest urban area in the country, the economic and political center, therefore a strong earthquake shaking in this region could have a significant socio-economic impact on the whole country.

Reports of the heavy damage caused by earthquakes in the wider Zagreb area have been documented in the past and the most significant events are shown in Table S1 and Fig. S1 in the Supplementary Information. The strongest event, the Great Zagreb earthquake (VIII MSK, macroseismically estimated magnitude Mm = 6.2), occurred on 9th of November 1880 around 6:34 Coordinated Universal Time (UTC) with macroseismic epicentre estimated to be near the Planina village, 17 km NE of Zagreb, on the so-called Kašina fault (Fig. 1a). The earthquake caused a lot of damage in the nearby villages and the city of Zagreb (1400 out of 2500 buildings were damaged or completely destroyed) and prompted soil liquefaction in the valley of the Sava River (Torbar 1882). It was followed by a series of earthquakes that caused panic among the population, particularly in the first six months. This event sparked the interest in studies related to earthquakes which eventually led to Mohorovičić’s discovery of the crust-mantle boundary. It is one of the most important Croatian earthquakes that practically defines the lower hazard bounds in the Zagreb epicentral area (with local magnitude ML = 6.5 being the maximum expected magnitude in this seismic zone Tomljenović 2020).

Fig. 1
figure 1

a Area covered by the 3D model. Dark red line indicates the Zagreb metropolitan area. Dark grey lines represent the main faults and red lines represent North Medvednica fault (NMF) and Kašina fault (KF) (modified after Tomljenović and Csontos 2001). b NW–SE cross-section of the 3D model. c P-wave velocity profiles for the locations P1–P6 shown in panel a (only the first 15 km are shown)

On 22nd of March 2020 at 5:24 UTC, moment magnitude MW = 5.3 earthquake hit Zagreb, causing significantly more damage in the old part of the city and the surrounding area than expected. This earthquake and its numerous aftershocks ignited the debate about the 1880 Great Zagreb earthquake source location. By comparing the damage reports it would seem that the epicentre of the 1880 event closely matches the location of the 2020 earthquake. Besides that, the latest event once again showcased how seismically vulnerable the region of Zagreb is and how damage distribution heavily depends on local site effects.

Traditionally, empirical ground motion prediction equations (GMPEs) (e.g. Douglas 2018 and references therein) are used to predict the expected ground motion from a given source location and magnitude. However, in the regions characterised by deep sedimentary basins and complex geological structures, these relations are not able to accurately predict ground motion, especially at long periods (Massa et al. 2012). Understanding how the complex 3D structures influence the broadband ground shaking is the key to improving the seismic hazard estimates and is indispensable when calculating seismic loading considered when designing earthquake-resistant buildings.

To better estimate earthquake shaking characteristics, a detailed knowledge of the local soil conditions and complex geological structures, such as sedimentary basins, is of particular importance. The effects of 3D structures result in the significant variation of ground motion even on small length scales. Numerical deterministic earthquake simulations have been able to model such effects. Many long-period ground motion simulations (f < 1 Hz) have been carried out in densely populated areas that have a high seismic hazard, such as Los Angeles basin (Süss and Shaw 2003), Coachella Valley (Ajala et al. 2019), Osaka basin (Iwata et al. 2008), Po Plain basin (Molinari et al. 2015). And more recently, largescale physics-based 3D ground motion simulations have also been used to improve seismic hazard assessment in areas such as Los Angeles (Graves et al. 2011) and New Zealand (Bradley et al. 2020). In order to conduct such simulations, a reliable 3D velocity and density model of the region is required. Simpler 1D and 2D models cannot always account for certain site effects and as a result are unable to produce an accurate simulation of the recorded ground motion (e.g. Smerzini et al. 2011). However, going to frequencies higher than 1 Hz is particularly challenging because a very high-resolution 3D model and a detailed knowledge of the particular rupture process is required. Therefore, a common approach to obtain the broadband ground motion (up to 10 Hz) is to combine the results from deterministic ground motion simulations (f < 1 Hz) with high frequency seismograms calculated by empirical-stochastic methods (e.g. Mai et al. 2010; Graves and Pitarka 2010). Attempts to apply the so-called Hybrid methods have been made e.g. by van Ede et al. (2020) in the Po Plain, Lee et al. (2020) in New Zealand, Sekiguchi et al. (2008) in Japan, etc. Much like the previously mentioned regions, basin structure has a profound influence on the ground motion in the wider Zagreb area, both amplifying and prolonging the duration of long-period motions. For these reasons, as well as the fact that the situation is specifically critical in the Zagreb region (due to the sparse seismic station distribution and characteristics of the local seismicity), we decided to conduct a similar study and apply the hybrid method to obtain simulated waveforms.

In this contribution, our two main goals are: (1) to simulate and study the 3D features of the broadband ground shaking in the wider Zagreb area that could help in better understanding of the seismic hazard; (2) to simulate earthquake scenarios that help in characterising the most probable fault plane for the MW = 6.2, 1880 Zagreb event.

Based on published geological and geophysical data, we first build a 3D seismic model that reflects all of the important geological features necessary to generate wave resonance effects that impact the duration and amplitude of the shaking in the wider Zagreb area. Using the software package SPECFEM3D Cartesian (e.g. Komatitsch et al. 2010, 2016; Peter et al. 2011) we deterministically simulate long-period ground motion (f < 1 Hz) in our new 3D model and verify the results by providing a measure of the fit to long-period seismograms recorded in the region after moderate events. We then compute the full broadband seismograms by applying the hybrid method: we combine the long period signals with the high frequency ones (f > 1 Hz) obtained following the methods proposed by Graves and Pitarka (2010) and Goldberg and Melgar (2020). The comparison between the broadband seismograms with the recorded data for five events that occurred in the wider Zagreb area allow us to assess the goodness of fit (GOF) scores for duration and peak ground velocity (PGV).

Moreover, in order to contribute to the debate on which fault plane caused the 1880 Zagreb events, we compute the MW = 6.2 earthquake on two different source locations—one matching the March 2020 event hypocenter (North Medvednica fault) and the other matching the historically assumed location of the 1880 event (Kašina fault). We assess the peak ground velocities and Arias intensities in the city and the surrounding area and compare them to the observed intensities. This way, we can determine which parts of the studied area are most likely to sustain damage depending on the source (North Medvednica and Kašina fault) and therefore which of the two faults could be a preferable source location for the 1880 event.

2 Development of the 3D seismic model

The wider Zagreb area is located in NW Croatia, in the SW corner of the Pannonian basin and close to the transition zones towards the Dinarides and the South Alps (Tomljenović and Csontos 2001; van Gelder et al. 2015). Today the geological structure of the studied area is primarily a consequence of the Miocene to Quaternary N–S to NW–SE shortening (Saftić et al. 2003). Tectonic inversion that occurred in the SW part of the Pannonian basin led to the formation of up to 1000-m-high E- to ENE-striking isolated hills composed of Mesozoic rocks that emerge out of the Neogene to Quaternary fill of the Pannonian basin system (Tomljenović and Csontos 2001). Complex evolution, lithospheric structure and its interaction with asthenosphere in the SW part of the Pannonian basin have been the focus of several studies based on gravity modelling (Šumanovac 2010), seismic reflection and refraction (Brückl et al. 2007), tomography using local earthquakes (Kapuralić et al. 2019) and teleseismic earthquakes (Šumanovac et al. 2017) and receiver function analysis (Hetényi and Bus 2007; Šumanovac et al. 2016; Stipčević et al. 2020). Hydrocarbon exploration in the Pannonian basin resulted in plenty of seismic reflection profiles, borehole data and other information about the structure and evolution of the studied area. These materials have been used to define the uppermost crustal structure of the wider Zagreb area in several published papers (e.g. Tomljenović and Csontos 2001; Saftić et al. 2003) and within those studies enough information was provided for us to successfully create a 3D model that has adequate resolution for the simulation purposes.

We assembled a 3D structural model that covers 60 km × 80 km (45.505°–46.225°N) × (15.700°–16.495°E) area around the city of Zagreb (Fig. 1a) and extends to the depth of 60 km. It describes the seismologically relevant parameters, density, P- and S-wave velocity, on a working grid of 125 m in UTM (zone 33 N) coordinate system. The model includes surface topography and is represented by four main layers: sediments (composed by three sub-layers), upper crust, lower crust and mantle (Fig. 1b). The digital elevation data for the surface topography was collected from the NASA Shuttle Radar Topography Mission Global 30 m (2013). The model is publicly available (see Data availability).

The sedimentary layer of our model was created using the data from Saftić et al. (2003). In that research, three main lithological units were identified in the SW part of the Pannonian basin and therefore, the sedimentary layer in our model consists of three sublayers. The three lithological units with various compositions and thicknesses are often referred to as megacycles because of the depositional cycles that took place in the Pannonian basin (Velić et al. 2002). They are separated by three major unconformities, the Base Neogene, the Base Pannonian and the Base Pliocene (Fig. 2). Sediments of the 3rd megacycle, which is separated from others by Base Pliocene unconformity, are found only in certain parts of the Pannonian basin. We include this information when defining the velocity and density model (e.g. in Fig. 1c profiles P1, P2, P4 and P5 have only two sublayers, while profile P6 has all three sublayers).

Fig. 2
figure 2

Thickness maps of the three megacycles in the SW part of the NW Pannonian basin: a 1st megacycle, Base Neogene–Base Pannonian. b 2nd megacycle, Base Pannonian–Base Pliocene. c 3rd megacycle, Base Pliocene–surface. d The entire Neogene to Quaternary sequence. (modified after Saftić et al. 2003)

The first megacycle contains mostly sediments of the Lower-Middle Miocene, such as calcareous and other variations of marls, limestones and sandstones. Second megacycle is built up of Upper Miocene deposits such as sandstones and silty marls and the deposits of the third megacycle consist of gravel, sand, clay and other Pliocene–Quaternary deposits.

We georeferenced isopach maps of the megacycles from Saftić et al. (2003) and resampled the contour interval to spacing of 125 m. The isolines were then converted to isopolygons and resampled to the working grid of 125 m in UTM coordinate system. To interpolate resampled values, a nearest-neighbour interpolation scheme was used. Knowing the rough composition of each of the three units, the average values of the minimum and maximum P-wave velocities were derived by combining the data from Faust (1951) and Brocher (2008). Following Molinari et al. (2015), the range of the P-wave velocities was linearly interpolated with depth, with steeper gradient in the first 500 m to account for the existence of the rough, non-consolidated material (Fig. 3). For the S-wave velocities and density values, Brocher (2005) relations were used.

Fig. 3
figure 3

a P-wave velocity profiles associated with three megacycles in sediment layer. b S-wave velocity profiles and c Density profiles scaled from Vp via Brocher’s relations (Brocher 2005) associated with three megacycles in sediment layer

The depths of other main layers, namely, the upper and lower crust and Moho, were taken from the regional EPcrust model (Molinari and Morelli 2011). Besides the sedimentary layer, the upper crust in our model also includes the Pre-Neogene rocks that emerge on the surface. In order to describe them, we define an auxiliary boundary in the upper crust by laterally expanding Neogene base beneath igneous rocks. This boundary determines to which depth the P-wave velocity value is constant (3.6 km/s) for the Pre-Neogene rocks. Beneath the auxiliary boundary, first 2 km are considered to be composed of metamorphic and non-metamorphic rocks in combination with other igneous rocks, where P-wave velocity increases with depth from 3.6 to 4.8 km/s. The rest of the upper crust is considered to be a typical crystalline crustal unit with the P-wave velocity of 4.8–6.1 km/s. In the lower crust and mantle, P-wave velocities are considered to be constant with the depth and have a value of 7 km/s and 8 km/s, respectively. All of the velocity ranges were derived by combining the data from Christensen and Mooney (1995) and Šumanovac et al. (2016) and once again Brocher (2005) relations were used to determine the S-wave velocities and density. Velocities and corresponding densities in each layer are shown in the Table 1.

Table 1 P- and S-wave velocities and densities of the four main layers in the model for the wider Zagreb area

Lastly, in order to avoid artefacts in the simulations that could be generated at the sharp boundaries between layers, we smoothed the model using a horizontal 2D Gaussian filter with a smoothing width of 3 km.

3 Data

To test and evaluate the performance of the new 3D model for the wider Zagreb area we compared simulated data with recorded seismograms for five events: the Marija Bistrica earthquake that occurred on January 28th, 2020 (MW = 3.5), Zagreb mainshock that occurred on March 22nd, 2020 (MW = 5.3), two aftershocks that quickly followed the mainshock on March 22nd (MW = 4.7 and MW = 3.3) and one that happened a month later on April 23rd (MW = 3.3). The events were recorded by the seismological stations of the Croatian Seismograph Network (CSN; University of Zagreb 2001). Prior to the March 22nd event, four stations (LOBO, KALN, ZAG and PTJ) were operational within the studied area and since then two more stations (CRET and KASN) have been installed. Unfortunately, the two strongest events (MW = 5.3 and MW = 4.7) have been recorded on-scale by only two stations (LOBO and KALN; the records at ZAG and PTJ were saturated) and just one event (MW = 3.3, April 23rd) has been recorded with the newly installed instruments. Nonetheless, we argue that enough data has been recorded to test the usability of the model to compute simulated seismograms.

Hypocenters of the events are shallow (less than 15 km) and located within the vicinity of Medvednica mountain (see Table S2 in Supplementary Information and Fig. 4). In general, seismic activity in this area occurs on reverse ENE–WSW striking faults and along NW–SE striking dextral faults (e.g. Tomljenović and Csontos 2001; Tomljenović et al. 2008). Fault-plane solutions for four Zagreb earthquakes indicate reverse faulting, while the Marija Bistrica event exhibits strike-slip motions, both faulting mechanisms being typical for this part of north-western Croatia (Herak et al. 2009).

Fig. 4
figure 4

Locations and focal mechanisms of the five simulated earthquakes. Blue triangles represent stations. Red line indicates the Zagreb metropolitan area

Fault-plane solutions obtained on the basis of the first motion polarities have been provided to us by M. Herak (personal communication). For all the events, except the strongest one, sources are represented by point force using centroid-moment tensor (CMT) solution format. To achieve more detailed simulation of the source effects caused by the rupture process on the fault, we define a simple finite-fault source model for the strongest MW = 5.3 event which occurred on North Medvednica fault. We estimate the dimension of the fault (length, L = 4.6 km; width, W = 3.7 km) following the Wells and Coppersmith (1994) earthquake scaling factors. We then divide the fault plane into 20 patches along strike and 15 patches along dip. Every patch can be considered as a subevent which has the same moment tensor components (making the slip distribution homogeneous across the fault) calculated from seismic moment and fault parameters (strike, dip and rake). The sum of these individual moment tensors is equivalent to the moment tensor a single point source MW = 5.3 event would have. Subevents are shifted in time as pictured in Fig. S2 in the Supplementary Information and have the same half duration (half-width of the source-time function). The half duration times are calculated from the estimated patch length divided by the assumed rupture velocity of 2.2 km/s.

4 Low-frequency simulation

To calculate the low frequency (LF) part of simulated seismograms, we used the software package SPECFEM3D Cartesian (e.g. Komatitsch et al. 2010, 2016; Peter et al. 2011; see Supplementary Information for details about the software computation). The software implements the spectral element method (SEM) that was originally introduced in computational fluid dynamics by Patera (1984) and since then has been successfully adapted for seismic wave propagation applications (e.g. Cohen et al. 1993; Faccioli et al. 1997; Komatitsch and Tromp 1999; Mazzieri et al. 2013; Paolucci et al. 2021). Of relevance for us, SEM combines the flexibility of finite element methods with the accuracy of spectral methods which makes it ideal to accurately handle distorted mesh elements, implement anisotropy (although we did not use such complication in the present simulations), attenuation, topography, fluid–solid and other types of boundaries and also deal with finite fault sources.

We implemented the newly created 3D model for the wider Zagreb area in the computational mesh built with the internal SPECFEM3D mesher. The mesh takes into account the topography and consists of more than 14 million hexahedral elements whose size doubles at depths greater than 14 km. Minimum element width is about 0.1 km which is sufficient to accurately simulate seismic waves with the minimum period of ~ 1 s. The attenuation model was scaled from the S-wave speed model following Olsen's empirical relations (Olsen et al. 2003). According to these relations, for the low frequency range (f < 0.5 Hz), quality factor Q is linearly dependent on the shear wave velocity and the ratio of the two is called Olsen ratio (Q/Vs). For the purposes of simulation, we use Olsen ratio of Q/Vs = 0.02 s/m as no significant differences in amplitude were observed when changing the value (Fig. 5). Each of the simulated seismograms has been convolved with the Liu et al. (2006) source time function whose rise times were constrained from seismic moment following Somerville et al. (1999).

Fig. 5
figure 5

Fourier spectra of the recorded and simulated LF (f < 1 Hz) seismograms of the January 28th, 2020 (MW = 3.5) event for two stations a LOBO and b ZAG. Simulated data with different Olsen ratios (Olsen et al. 2003) is compared. Since there are minimal differences in amplitude, the value of 0.02 was chosen for the simulations

Using this approach, we first compare the recorded data for the MW = 5.3 event with the long-period simulations (T > 1 s) obtained using the point force and the simple finite-fault source model as described in section Data. We do this to determine which of the two source models is more appropriate for simulation of this particular earthquake. As shown in Fig. 6 (and Fig. S3 in Supplementary Information), there is a considerable difference between the results obtained from a point force and a finite-fault source model. The point force model overestimates the velocity amplitude (in terms of absolute amplitude) roughly by a factor of 2–5. The degree of overestimate for both stations depends on the selected component, with greatest discrepancies occurring on the E-W component of seismogram. Furthermore, overestimation factors seem to depend on the source-station distance, with station LOBO (epicentral distance D = 30.6 km) having greater values than KALN (D = 43.9 km). Finite-fault source model seems to result in visual better fit in amplitude of the simulated and recorded data for both stations and all three components, making it more adequate choice for our simulation purposes. Therefore, for the MW = 5.3 event and the two historic events (described in section Application to historic Zagreb earthquakes) we decided to represent the source using the finite-fault model.

Fig. 6
figure 6

Low frequency (f < 1 Hz) seismograms and corresponding Fourier spectra of the recorded and simulated data of the March 22nd, 2020 (MW = 5.3) event for stations a LOBO and b KALN. Simulated data with different sources (point source and finite-fault source) is compared

To confirm the ability of the 3D model to reproduce amplification and longer shaking duration in the sedimentary basin, we compare low-frequency simulations with the ones obtained using a simple 1D velocity model. The 1D model consists of two layers over a half-space. First layer has a thickness of 30 km and P- and S-wave velocity values of Vp = 5.8 km/s and Vs = 3.45 km/s, respectively. Second layer has a thickness of 10 km, Vp = 6.65 km/s and Vs = 3.85 km/s. This specific model (B.C.I.S. 1972) is regularly used to routinely locate earthquakes for Croatia seismic bulletins. To best reflect the differences caused by the lateral variability of the structure in the studied area, we compute the waveforms for a number of virtual stations along four different profiles (Fig. 7). The results of simulations for the two models and the January 28th (MW = 3.5) along profile E–F are shown in Fig. 8 (the results for other profiles are shown in Fig. S4 in the Supplementary Information). When possible, we also compared the recorded data from the nearest operating seismic station. The data is filtered with a fourth-order low-pass Butterworth filter and a cut-off frequency of 1 Hz.

Fig. 7
figure 7

Virtual stations (black triangles) along four profiles A–B, C–D, E–F and G–H used to test the differences between the performance of the 3D model and a 1D model. Blue triangles indicate operating stations closest to the profiles of interest. Red line indicates the Zagreb metropolitan area

Fig. 8
figure 8

Simulated and recorded north–south component of seismograms for the January 28th (MW = 3.5) event and 2D E–F profile. The event is simulated using a point source and 1D and 3D seismic models on virtual stations on four cross-sections shown in Fig. 7. Blue triangles indicate operating stations closest to the profile of interest. Black dashed lines indicate bottom of pliocene layer, gray lines bottom of pannonian layer and thick black lines magmates

As shown in Fig. 8 1D model fails to produce amplification and other local effects expected in such a structurally complex area: the waveforms are of much shorter duration and the amplitudes of the surface waves are barely seen. The 1D model captures general shape of data only for sites on bedrock (e.g. near station PTJ). On the other hand, the 3D model is able to account for lateral variations of the wavefield and shows considerable improvement with the respect to the 1D model. This is especially evident near the station KALN, where the duration and the amplitude of the ground shaking result in a visual better fit with the recorded signal (blue signal) then the one predicted by the 1D model.

5 High-frequency simulation

To calculate high-frequency component of the simulated waveforms for six sites of interest (Fig. 4), we employ the semi-stochastic method of Graves and Pitarka (2010) which is implemented within the Southern California Earthquake Center (SCEC) Broadband Platform (Graves and Pitarka 2015; Maechling et al. 2015). In this approach, source radiation is represented stochastically while wave propagation and scattering effects are represented in a deterministic way. Because the method of Graves and Pitarka (2010) includes only the contribution of the greater amplitude S-waves, we extend the method to model P-waves following the work of Golberg and Melgar (2020).

Each subfault i contributes to the acceleration amplitude spectrum by:

$$A_{i} \left( f \right) = \mathop \sum \limits_{j = 1, M} C_{ij} S_{i} \left( f \right)G_{ij} \left( f \right)P\left( f \right)$$
(1)

where the summation over j goes from 1 to M different types of rays. Here \(C_{ij}\) is wave radiation scale factor, \(S_{i} \left( f \right)\) is the source-radiation spectrum, \(G_{ij} \left( f \right)\) is the path term and \(P\left( f \right)\) is the high-frequency decay. Since factors \(C_{ij}\) and \(G_{ij} \left( f \right)\) are differently defined for the P- and S-waves, corresponding amplitude spectrums are calculated separately and afterwards summed in the time domain (for the details see Graves and Pitarka 2010; Goldberg and Melgar 2020). Furthermore, to account for lateral velocity heterogeneities and geological differences, we specify the required parameters independently at each of the stations (Table S3 in Supplementary Information).

Phase spectrum of the radiated acceleration for each ray is randomly derived from a windowed time sequence of band-limited white Gaussian noise (see Boore 1983; Graves and Pitarka 2010). The choice for the random phasing is justified by the fact that knowledge about the sources at frequencies higher than 1 Hz is limited and therefore describing them deterministically would be too difficult.

To define the acceleration spectrum, we first construct the 1D model by averaging the profiles sampled at each of the stations and source locations from our 3D velocity model (for details about the model see Table S4 in Supplementary Information). This way, we obtain a more reliable representation of the medium between the source and stations and are able to better fit the arrival times with the recorded data. To be consistent with the low-frequency simulation, the constant quality factor of each 1D velocity model layer is modeled using Olsen's parametrization with the ratio of 0.02 s/m. The degree of the frequency dependence for each station is taken from Dasović et al. (2013) and the values for the high-frequency attenuation parameter kappa are taken from Stanko et al. (2020). We use a constant stress parameter of ∆σ = 5 MPa which is a typical value for active shallow crustal regions (Graves and Pitarka 2010).

Once defined, the acceleration spectrum is convolved with the simplified Green's functions calculated using 1D velocity model and a frequency-wavenumber integration algorithm (Zhu and Rivera 2002). Lastly, to account for site specific conditions and amplification effects, period-dependent, non-linear amplification factors are applied to the high frequency simulated waveforms (for details see Graves and Pitarka 2010). Calculation of these factors requires the 30 m travel-time averaged S wave velocities (Vs30). For stations CRET, KASN and ZAG, we use Vs30 data from Miklin et al. (2019). And since there are no local measurements available for the rest of the stations (KALN, LOBO and PTJ), we extract the local S-wave velocity values for these sites from the USGS Global Slope-Based Vs30 (Allen and Wald 2007).

6 Result validation

With the separately computed low frequency and high frequency waveforms, we obtain a single broadband time series for the events and stations shown in Fig. 4. We superimpose the two datasets in the time domain after filtering using a 4th order Butterworth filter and a common corner frequency of 1 Hz (Hartzell et al. 1999; Graves and Pitarka 2010). We then compare the simulated and recorded data after processing the two datasets in the following way: (1) first, we resample both records to a common sample rate of 50 Hz; (2) we cut both traces using the same window, with t = 0 s being hypocenter time; (3) we remove linear trend and mean from the records and taper them; (4) from the recorded seismograms, we remove instrumental response; (5) we apply the same broad-band filter (0.01–10 Hz) to both simulated and recorded waveforms. Besides visual inspection, in order to assess the reliability of our broadband simulations against the recorded data, we calculate the goodness of fit measure (GOF) following the work of Olsen and Mayhew (2010). Their GOF is defined as a weighted average using up to ten different metrics which measure the misfit between the simulated and the recorded data. The GOF scores go from 0 to 100, with 0–45 representing a poor fit, 45–65 a fair fit, 65–80 a good fit and 80–100 an excellent fit. For the purpose of this research, we follow the approach of van Ede et al. (2020) and calculate GOF scores only for the two of the ten proposed metrics, the peak ground velocity (PGV) and the duration. We focused our comparison on the two simplest but significant parameters, for engineer purposes, that describe the similarity between two waveforms while keeping the GOF measure easy to interpret. PGV gives information on how well synthetic signals can reproduce the shaking amplitude while the duration tells us if the 3D model is able to reproduce the reverberation and the reflections caused by the sedimentary basins. To have a better insight into the results and the accuracy of the simulation, we discuss the individual PGV and duration GOF scores rather than taking the weighted average of them. We do however, for each event and stations that recorded it, take the mean GOF score of all three seismogram components. For each component, both PGV and the duration GOF score is calculated using the expression:

$$GOF = 100 \cdot erfc\left( {2\left( {syn - rec} \right)/\left( {syn + rec} \right)} \right)$$
(2)

where syn stands for the simulated metric of interest and the rec for the metric obtained from the recorded data. For the PGV, syn and rec are defined as max|v(t)| where v(t) is the velocity time series. For the duration, syn and rec are the frequency dependent durations of strong motions for the simulated and the recorded data, respectively, calculated using the method of Novikova and Trifunac (1995).

Vertical component of the broadband waveforms (0.01–10 Hz) and the corresponding Fourier spectra of the simulated and the recorded data for the stations and events of interest are shown in Figs. 9, 10, 11, 12 and 13 (plots for all three components for each event are shown in Figs. S5–S9 in the Supplementary Information). The PGV and duration GOF scores are shown in Tables S5 and S6 in the Supplementary Information.

Fig. 9
figure 9

Vertical components of the broadband seismograms (bandpass filtered 0.01–10 Hz) and the corresponding Fourier spectra for the Zagreb, 23-04-2020, MW = 3.3 event and stations CRET, KALN, KASN, LOBO, PTJ and ZAG

Fig. 10
figure 10

Vertical components of the broadband seismograms (bandpass filtered 0.01–10 Hz) and the corresponding Fourier spectra for the Zagreb, 22-03-2020, MW = 3.3 event and stations KALN, LOBO, PTJ and ZAG

Fig. 11
figure 11

Vertical components of the broadband seismograms (bandpass filtered 0.01–10 Hz) and the corresponding Fourier spectra for the Marija Bistrica, 28-01-2020, MW = 3.5 event and stations KALN, LOBO, PTJ and ZAG

Fig. 12
figure 12

Vertical components of the broadband seismograms (bandpass filtered 0.01–10 Hz) and the corresponding Fourier spectra for the Zagreb, 22-03-2020, MW = 4.7 event and stations KALN and LOBO

Fig. 13
figure 13

Vertical components of the broadband seismograms (bandpass filtered 0.01–10 Hz) and the corresponding Fourier spectra for the Zagreb, 22-03-2020, MW = 5.3 mainshock event recorded at stations KALN and LOBO

At this stage of the work, we are not in the situation of being able fit each wiggle of the recorded seismogram. However, it is clear that records for different events exhibit rather distinctive characteristics which are fairly well matched by the simulated signals (Figs. 9, 10, 11, 12 and 13).

In general, the Fourier spectra of the simulated high frequency part matches well the spectra of the recorded data, decreasing in amplitude as the frequency increases. Differences in the high-frequency part of spectra are partly caused by the fact that the simulation is done stochastically. Regarding the low-frequency part of the Fourier spectrum, larger differences between the simulated and recorded dataset seem to occur at frequencies smaller than 0.1 Hz, indicating that the simulated waveforms are representative up to periods of 10 s which is enough when it comes to applications in the earthquake engineering, construction and other sub-disciplines of the civil engineering.

In some cases, the simulated waveforms tend to slightly underestimate the recorded amplitudes (e.g. LOBO and KALN for Zagreb 22-03-2020, MW = 5.3 event), while in other cases (e.g. LOBO and KALN for Zagreb 23-04-2020, MW = 3.3 event) overestimate them roughly by a factor of 2–3. Accordingly, the PGV GOF scores take on a variety of values for all stations, ranging from poor to good, depending on the magnitude of the event as well as the source mechanism and station location. For instance, PTJ, ZAG and LOBO have higher PGV GOF scores for the Marija Bistrica (MW = 3.5) event than for two Zagreb (MW = 3.3) events. This indicates that smaller magnitude events result in a worse amplitude fit between the simulated and recorded data. On the other hand, the score for KALN station is the highest for the 23-04-2020 event and when comparing the scores for different stations between the two MW = 3.3 earthquakes, it is evident that the source directivity, and other effects of the source mechanism also contribute to the final result. Besides the description of the source, mismatch between the amplitudes of the recorded and simulated waveforms could be the result of a velocity model as well as the source-station distance. In general, scores are higher for stations with greater epicentral distances for all earthquakes, except the Marija Bistrica (M W = 3.5) event. As for the duration GOF scores, the values have a span from poor to excellent, once again depending both one the event magnitude and position of the station relative to the source. For both PGV and duration GOF score, it is worthy to note that even the smallest GOF values are actually relatively close to the fair range of fit (see Tables S5 and S6 in the Supplementary Information). In conclusion, despite a relatively small data sample (especially for stations CRET and KASN that have only one GOF per event value) and misfits stemming from the simulation input parameters such as description of the source and/or the velocity model, GOF scores are overall acceptable and quite promising for all of the stations and events used in this research.

7 Application to historic Zagreb earthquakes

To get a better understanding of the ground motion characteristics in the wider Zagreb area if a stronger earthquake were to occur, we simulated the historic November 9th 1880, MW = 6.2 event. This event, also known as The Great Zagreb earthquake, is one of the most important earthquakes that occurred in the continental part of Croatia as it de-facto governs the hazard assessment for this area. Despite the abundance of the information collected, ranging from the numerous reports about the damage, observed effects such as liquefaction, the sequence of the aftershocks that followed the main event and several seismic reports from that time (e.g. Torbar 1882; Hantken von Prudnik 1882), there are still several lingering questions, mainly concerning the location and nature (mechanism) of the fault on which this earthquake occurred. The initially proposed location on the Kašina fault (which was based on the macroseismic and other reported observations (Cvijanović 1982); see Fig. 1a for location), became even more of a debated topic after the 22nd March 2020 earthquake as the damage reports from both of these events seem to have the same spatial distribution. Therefore, to provide insights into this scientific debate and to test the idea that the Great Zagreb earthquake occurred on the same fault as the March 2020 event, we simulate two ground shaking scenarios considering two different hypocenter locations, at the Kašina fault and the North Medvednica fault (Fig. 14). For the purposes of this paper we focus primarily on the low-frequency part of the simulation because, to the best of our knowledge, currently there is not enough data to independently specify parameters needed to conduct an accurate high-frequency simulation in the whole studied area. The procedure, 3D model, mesh and attenuation model, required to obtain the low-frequency scenarios are the same as in section Low-frequency simulation. The two sources are represented by a simple finite-fault model, the same way as was done for the MW = 5.3 event (described in the section Data). The expected fault-plane solutions have been provided to us by M. Herak (personal communication). Details about the sources are shown in Table S7 in Supplementary Information. Our representation of the Kašina and North Medvednica faults (adapted from Tomljenović and Csontos 2001), corresponding focal mechanisms and the epicentre locations are shown in Fig. 14a, b.

Fig. 14
figure 14

Simplified representation of the fault, corresponding focal mechanism and the epicentre location for the a Kašina fault and b North Medvednica fault. Black line marks the boundary of the Zagreb metropolitan area

Between the two proposed source locations for the 1880 event, a more preferable one could be determined by comparison of the observed and simulated macroseismic intensities. Macroseismic intensities can be estimated from the linear relationship between the intensity and the logarithm of the ground motion parameter, such as peak ground acceleration, velocity or displacement, Arias intensity etc., but since we are working with only the low-frequency part of the simulation, we do not estimate them. However, we made a preliminary study to test the hypothesis about the misplaced 1880 event source location. We do this by comparing the observed intensities with two measures obtained from the low-frequency (f < 1 Hz) simulations: (1) peak ground velocity (abbreviation: PGVLP) (Fig. 15) and (2) Arias intensity (Arias 1970) (Fig. 16). For both measures, we take total horizontal values defined as: (1) the geometric mean of horizontal components for the PGVLP value; (2) the sum of the individual Arias intensity values obtained from N–S and E–W components (following the work of Bozorgnia and Campbell 2016).

Fig. 15
figure 15

a Map and b comparison of the observed intensity (MSK scale) for the 1880, MW = 6.2 event (Herak et al. 2021) and the logarithm with base 10 of the horizontal peak ground velocity (PGVLP) calculated from the low-frequency (f < 1 Hz) simulated waveforms for the Kašina fault and North Medvednica fault. Black line marks the boundary of the Zagreb metropolitan area

Fig. 16
figure 16

a Map and b comparison of the observed intensity (MSK scale) for the 1880, MW = 6.2 event (Herak et al. 2021) and the logarithm with base 10 of the Arias intensity calculated from the low-frequency (f < 1 Hz) simulated waveforms for the Kašina fault and North Medvednica fault. Black line marks the boundary of the Zagreb metropolitan area

For the two scenarios of interest, spatial distributions of the PGVLP and Arias intensities differ significantly and seem to be directly correlated with the source description (focal mechanism, fault geometry and definition of the onset times along the fault). In the Kašina fault scenario, spatial distributions of both logarithm of PGVLP (Fig. 15a) and Arias intensities (Fig. 16a) are extremely elongated in the NW–SE direction. Most of the energy seems to be radiated and aligned with the fault orientation, consequently indicating that estimated intensities from such a source could also deviate from the observed data. This is not the case when looking at the North Medvednica fault scenario—majority of the energy seems to be uniformly radiated all around the vicinity of the mountain, resulting in a considerably better fit with distribution of the observed intensities for both PGVLP and Arias intensity maps. Comparison of the observed intensities and logarithms of PGVLP and Arias intensity (shown in Figs. 15b, 16b, respectively) further extends on this premise. For both measures and faults, there is a noticeable scatter in data which is actually quite expected when working with subjective measures such as the macroseismic intensity (e.g. Ardeleanu et al. 2020 and references therein). Besides that, since we calculated PGVLP and Arias intensity using only the low-frequency waveforms, we assumed there would be a certain amount of dispersion in data as well as the outliers. However, when looking at medians and interquartile range, correlation between the values seems to be rather apparent in the case of North Medvednica fault scenario. In order to quantitatively verify this observation, we calculate the Spearman's rank correlation coefficient between observed intensities and the logarithm of the two ground motion parameters. Spearman's rank correlation coefficient (rs) assesses the monotonic relationship between two variables with correlation coefficient of ± 1 indicating perfect positive/negative correlation. For the North Medvednica fault scenario we get rs = 0.60 for PGVLP and rs = 0.61 for the Arias intensity. For the Kašina fault scenario, rs = 0.43 for PGVLP and rs = 0.28 for Arias intensity. Both PGVLP and Arias intensity rs values favour the North Medvednica fault scenario, indicating moderate correlation with the observed intensities, unlike the Kašina fault scenario which yields considerably weaker correlation. This suggests that the 1880 earthquake could have indeed occurred on the same fault as the MW = 5.3, 2020 earthquake, the North Medvednica fault. However, to support this claim, more evidence needs to be provided, e.g. by comparison of the observed intensities with ground shaking parameters estimated from complete broadband waveforms.

8 Discussion and conclusion

With the publicly available geological data (for details see section Development of the 3D seismic model), we created a 3D seismic model for the wider Zagreb area. The model was designed to calculate seismic wave-propagation using numerical method. It describes in detail main structures observed in the uppermost part of the crust and is embedded within the regional EPcrust (Molinari and Morelli 2011) crustal model. The studied area encompasses rather complex geological structures, such as sedimentary basins and high-velocity structures known to impact ground motions. Therefore, implementing this information is much needed if we want to accurately simulate ground motion in the studied area. To test this hypothesis, we compared the low-frequency simulation results using 3D model with those obtained using a simple 1D model. We concluded that, unlike the 1D model, our 3D model is indeed able to produce main characteristics of the ground motion, primarily shaking duration and amplification effects. To further test the performance of the model, we then computed the full broadband waveforms for five events and six stations using a hybrid technique. The results were quantitatively compared with the recorded data using the goodness of fit scores for the peak ground velocities and duration. For all events and stations, both PGV and duration GOF scores provided encouraging results, indicating that our 3D model is suitable for simulation of shaking scenarios in the wider Zagreb area. Further refinement of the model and other input parameters, as well as the implementation of the 3D model in the high-frequency part of simulation, would yield even better results. Despite that, results presented in this work are a needed starting point for future research which will contribute to improved understanding of ground-motion in the studied area. This is of particular interest for possible larger events, which have been documented in the past. Hence, in this work we also examined and focused on the Great Zagreb earthquake of 1880 which is the strongest known event to have ever occurred in this region. We simulated low-frequency seismograms of such an earthquake (MW = 6.2) on two sources—North Medvednica and Kašina fault. We then plotted shakemaps to: (1) determine the expected ground-motion features if such an event would occur today; (2) address the ongoing question about the 1880 event source location. We observed that the two sources result in a very distinguishable PGVLP and Arias intensity spatial distributions, implying that the corresponding damage distributions would also differ significantly. In order to substantiate this claim, but also explore the idea about misplaced source location of the 1880 event, we then directly compared the observed intensity data with the logarithm of the PGVLP and Arias intensities. From the obtained results, we concluded that the more probable source of the 1880 event could indeed be the North Medvednica fault and not the Kašina fault as previously assumed. Lastly, here we would like to mention that we are aware of the limitations of our modeling, primarily the 3D model and description of the sources. However, we believe that we conducted an important and relevant study for the wider Zagreb area using the currently available data and provided a much-needed base for future research.