1 Introduction

Chile is located in the Pacific Ring of Fire, where, in the northern part, the Nazca Plate subducts under the South American Plate. This configuration generates subduction earthquakes with a wide range of magnitudes. In the north of Chile, a large earthquake of magnitude \({M}_w\,\sim 8.8\) occurred in 1877 (Compte and Pardo 1991), and despite earthquakes of magnitudes up to 7.8 that have since occurred in this region (NEIC catalog http://earthquake.usgs.gov/earthquakes/search/), a mega-earthquake was expected to happen (Klotz et al. 2001).

On April 1, 2014, a major earthquake hit the region by rupturing part of the 1877 seismic gap (Hayes et al. 2014). The magnitude of 8.1 was not sufficient to liberate all of the interseismic strain accumulated since 1877 (Schurr et al. 2014); however, it did produce a moderate tsunami. The generated tsunami arrived at the near-coastal areas (<300 km from the epicenter), tens of minutes after the event origin time. Waves of moderate height (between 2 and 4 m) were registered in near locations such as Arica, Iquique and Tocopilla, whereas farther locations like Talcahuano had smaller, but still measurable, sea-level perturbations.

Several studies are made after an earthquake, which try to reconstruct the slip distribution. The simplest model which reproduces what happens during an earthquake is to consider a homogeneous slip which is delimited using aftershocks; more sophisticated models use different datasets, which include teleseismic records, geodetic data, tsunami data, among others, to make an inversion model; for example, An et al. (2014) used tsunami waveforms, Lay et al. (2014) used inversion from tsunami and seismological data, Schurr et al. (2014) used GPS and seismological data and Gusman et al. (2014) used tsunami and GPS joint inversion.

The propagation of the tsunami to the coastal areas is strongly affected by the sea floor bathymetry, a local effect in the different bays along the coast. Records of the tsunami arrival by tide gauges were taken along the coast of Chile. We compared these observed signals with synthetic ones obtained from the models mentioned above. To do this, we generated and propagated the tsunami waves for grids of bathymetry and topography and created a tsunami signal at the grid nodes closest to the tide gauge positions which recorded the actual tsunami. Then we performed a qualitative and statistical comparison between the observed and modeled tsunami signals.

2 Tsunami modeling

To perform the generation and propagation models of the tsunami, we use the software COrnell Multi-grid Coupled Tsunami Model (COMCOT), version 1.7, which uses explicit staggered leapfrog finite difference schemes to solve shallow water equations (Liu et al. 1998). Using a nested grid system, the model is capable of simultaneously calculating tsunami propagation in deep ocean and inundation in coastal zones, as within a region of one grid size there are one or more regions with smaller grid sizes. In this study we use four levels of nested grids for four different locations. COMCOTv1.7 utilizes uniform grid size (\(\Delta x = \Delta y\)) and assumes that the water surface displacement is the same as the deformation of the sea floor; in other words, the uplift motion is assumed to be much faster than the wave propagation. For an earthquake, the sea floor displacement is computed using the improved elastic finite fault plane theory of Okada (1985). The tsunami is modeled using the shallow water equations, in their linear form, while in deep ocean. As the waves approach the coast, it is necessary to use the nonlinear form of the shallow water equations, since the bathymetry and, therefore, the wavelength change. For each fault plane model, tsunami propagation, for a 6-h duration, was simulated using a time step of 0.3 s, which satisfies the Courant stability condition. In addition, we used a bottom friction coefficient of 0.025, which is widely used in tsunami simulation. This coefficient is equivalent to a coarse sand with a diameter of 2 cm (Masamura et al. 2000).

3 Grid generation

Since the propagation of tsunamis is greatly affected by the bathymetry and topography, we used the nested grid system to improve the resolution and save computer resources. We combined bathymetric and topographic data to generate 11 grids, where the first-level grid is common for all locations and covers an area from Arica to Concepcion Bay; this grid has a resolution of 2 min. There are two second-level grids with a resolution of 0.5 min, one located in the north of Chile (from Arica to Tocopilla) and the other around Concepcion Bay. Third-level grids were constructed with 0.1-min resolution and fourth-level grid with 0.016-min resolution for Arica, Iquique, Tocopilla and Talcahuano coastal areas.

For level 1 and level 2 grids, we resampled NASA Shuttle Radar Topography Mission, SRTM30 plus (Becker et al. 2009). This dataset is the fusion between the SRTM topography and sea floor bathymetry, estimated from satellite altimetry and ship depth soundings (Smith and Sandwell 1997), which has a resolution of 0.5 min. For the data manipulation, we used the Generic Mapping Toolkit (Wessel et al. 2013). For the third- and fourth-level nested grids, we used bathymetry from the Chilean Navy Hydrographic and Oceanographic Service (SHOA is the Spanish acronym) with up to 30-m resolution (http://www.shoa.cl/tramites/tramite.php), and the satellite topography from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) was included in the study. The ASTER dataset has a maximum resolution of 30 m and is a product of METI (http://gdem.ersdac.jspacesystems.or.jp/) and NASA (http://asterweb.jpl.nasa.gov).

4 Tsunami signals

4.1 Observed data

The observed data are acquired from SHOA tide gauges in the locations of Arica, Iquique, Tocopilla and Talcahuano cities. For each time series, the tide was subtracted in order to obtain tsunami waveforms. The oceanic tide was simulated using a classical harmonic analysis, T_TIDE, elaborated by Pawlowicz et al. (2002). Table 1 summarizes the location of the tide gauges.

Table 1 Summary of tide gauge locations

4.2 Homogeneous and heterogeneous fault models

Three different source models are used in this study, and they are shown in Fig. 1. The simplest is a homogeneous rectangular model in which the slip occurs simultaneously. The fault parameters used here were obtained from the Global Centroid Moment Tensor (http://www.globalcmt.org/) (Dziewonski et al. 1981; Ekström et al. 2012) which indicates an earthquake moment magnitude of 8.1, a fault geometry with strike \(355^{\circ }\), dip \(15^{\circ }\) and rake \(106^{\circ }\) at a centroid depth of 21.6 km and centroid location south of the hypocenter (19.70 S, 70.81 W). With these parameters, we used the relations given by Papazachos et al. (2004) to compute the area of the fault and Kanamori and Anderson (1975) to compute the slip, being \(184\,\times \,76\,\hbox {km}^2\) and 4.5 m, respectively. These approximations are consistent with the parameters calculated based on Blaser et al. (2010), Strasser et al. (2010), Kamigaichi (2011) and Murotani et al. (2013). The result is shown in Fig. 1a.

Fig. 1
figure 1

Initial deformation of the tsunami model using: a homogeneous fault model, b Lay et al. (2014) and c Schurr et al. (2014)

More complicated, observation-based models were calculated by some authors using inversion approaches from teleseismic, geodetic, deep water tsunami data, among others. The heterogeneous models of Lay et al. (2014) and Schurr et al. (2014) are used in this study. Lay et al. (2014) use 54 subfaults, each with an area of \(14.44 \times 11.48\,\hbox {km}^2\), with parameters of longitude, latitude, slip and rake. Dip, strike and depth were calculated using the slab model of Klotz et al. (2001). The initial deformation for this model is shown in Fig. 1b.

On the other hand, the model of Schurr et al. (2014) corresponds to a grid with \(0.05^{\circ }\) resolution which was resampled to \(0.15^{\circ }\), obtaining 69 subfaults, each one with parameters of longitude, latitude and slip. The rake used was \(102.96^{\circ }\), from the plate motion model NUVEL 1A (DeMets et al. 1994). To compute strike, dip and depth of each subfault, we used the slab model Slab1.0 of Hayes et al. (2012) which is consistent with the model of Schurr et al. (2014). The size of each subfault was \(16.485 \times 16.610\,\hbox {km}^2\). The initial deformation is shown in Fig. 1c.

5 Results

We modeled 6 h of the tsunami propagation and behavior for this study, and we used four locations for the comparison between modeled and observed data. Three of them are near to the rupture zone (Iquique, Arica and Tocopilla), while one location is around 1800 km farther south (Talcahuano). The position of the synthetic tide gauges is the closest possible to the real equivalents. The amplitude and waveform differences between the signals can be seen in Fig. 2, where the comparison between the observed and modeled tsunami signals for the four locations, during 6 h of propagation, is shown.

Fig. 2
figure 2

Comparison of observed tsunami signal (blue line) on April 1, 2014, and synthetic one using a homogeneous fault model (red line), the Lay et al. (2014) heterogeneous model (green line) and the Schurr et al. (2014) heterogeneous model (black line) for the locations of a Arica, b Iquique, c Tocopilla and d Talcahuano. The highlighted parts of the signals are used in the correlation analysis for the four locations

In order to quantitatively analyze the shape of the modeled and observed signals, we used the correlation coefficient, calculated for different lags and window lengths in order to assess the stability of the virtual tide gauge signals throughout time (using algorithms from Harris 1991). The correlation coefficient between the observed and virtual signals was calculated for up to the first 150 min after the tsunami arrival time, and the windows used are highlighted in Fig. 2. A progressively larger correlation window length was used, from 20 to 150 min, and lags of between \(-\)20 and 20 min were applied to the virtual signals in order to see how the correlation coefficient varies if the time series are displaced with respect to each other. The introduction of a variable lag and correlation window length permits an analysis of how accurately tsunami arrival times are predicted, and the duration of the tsunami signal that can be accurately reproduced by the virtual simulations. In this part of the analysis, it was necessary to interpolate the observed tsunami signal at Tocopilla between 54 and 67 min after the event origin time using a cubic spline in order to best estimate the tide gauge signal for the minutes in which the equipment did not record. The resulting correlation values are displayed in Fig. 3, which demonstrates the window lengths and lags for which the four locations have a good positive correlation between the observed and the three virtual signals (correlation values >0.5 are displayed).

Fig. 3
figure 3

The correlation coefficient between the observed tsunami signal and the three virtual signals for different lags and window lengths after the tsunami arrival, plotted for the locations of: a Arica, b Iquique, c Tocopilla and d Talcahuano

The correlation coefficient represents the relationship between the waveforms rather than the relative amplitudes of the waves; hence, two additional parameters were calculated to quantitatively estimate the difference and goodness of fit between the amplitude of the observed and virtual signals. Firstly, the normalized root-mean-square (NRMS) value was calculated for window lengths containing the first wavelength of the signal; therefore, it represents the amplitude fit of approximately the first tsunami wave. We used the first wavelength, because later than this the local geomorphology and self-oscillation have a large influence and the relative importance of the initial slip model is reduced (Yamazaki and Cheung 2011; Bellotti et al. 2012). The specific window lengths used for the four locations for the NRMS analysis were: Arica 47 min; Iquique 20 min; Tocopilla 27 min; and Talcahuano 80 min.

The calculation for the NRMS is:

$$ {\text {NRMS}} = \left[ \frac{1}{o_{\text {max}}-o_{\text {min}}}\right] \sqrt{\frac{\sum\nolimits_{i=1}^L(o_{i}-v_{(i-l)})^2}{L}} $$

for a window length L, lag l and observed and virtual signals \(o_i\) and \(v_i\), respectively. The normalization term is given by the inverse of the difference between the maximum and minimum observed values within the window (\(o_{\text {max}}\) and \(o_{\text {min}}\)) such that the calculated NRMS represents the error between the observed and virtual signals as a proportion of the observed signal height. This allows for better comparison between the four different locations. The NRMS was calculated both with zero lag (NRMS zero lag) and with a lag applied which gives the maximum correlation between the virtual and observed signals (NRMS optimal lag). The results of this statistical analysis are summarized in Table 2.

The second parameter used, which measures the goodness of the amplitude fit for the first peak, is the Spga parameter from Anderson (2004). He presents a measurement for the amplitude of the acceleration peak, where the Spga is defined as:

$$\begin{aligned} S(p_1,p_2)=10 \exp \left\{ -\left[ \frac{(p_1-p_2)}{ \min (p_1,p_2)}\right] ^2\right\} \end{aligned}$$

where \(p_1\) and \(p_2\) are the amplitudes of the two peaks to be compared. This function monotonically decreases as the difference between the parameters increases, which means that values closer to 10 represent a better fit. We used this parameter as a measurement of how well the first peak amplitude is obtained for the simulated signal compared with the observed tsunami. Results from the calculations are presented in Table 2.

Table 2 Statistical analysis of the tsunami signal comparison for the four locations, calculations are applied to the first wavelength of the arrivals

We observed that in general, the heterogeneous models better reproduced the shape and amplitude of the tsunami wave, compared with the homogeneous model. The models of Lay et al. (2014) and Schurr et al. (2014) are similar, while the homogeneous model is noticeably different.

In Arica, Schurr et al. (2014) and Lay et al. (2014) reproduce the first peak, with the former estimating its arrival time better, as shown in Table 2. The homogeneous model estimates more waves than the tide gauge observed, and the arrival time of the first peak is underestimated. Here the cross-correlation is performed with the very first peak of the virtual signal, as shown by the positive lag in Fig. 3 and Table 2, meaning the virtual signal has to be brought forward in time to correlate well with the observed one. The amplitude is well reached with the heterogeneous models while the homogeneous one overestimates the initial amplitude. This is confirmed by the goodness of the fit for the first peak, where the maximum value is obtained by Schurr et al. (2014). After the first peak, the signal is not totally reproduced and the correlation coefficients for the first wave are relatively low, compared to the other locations, although some of the subsequent peaks coincide with the actual tsunami amplitude. In this case, the models of Lay et al. (2014) and Schurr et al. (2014) are noticeably similar for the first 50 min, after which their correlations with the observed signal drop off.

In Iquique, while all the three models reproduce the shape and the arrival time of the first wave, the model of Schurr et al. (2014) produces a tsunami that agrees with the observation in all aspects (shape, amplitude and arrival time). This is confirmed by the high correlation values, zero lag time, low NRMS and high goodness of fit for the first peak. The subsequent oscillation for the three models has a dominant frequency similar to the observed one; however, the second and third waves are not well reproduced for the heterogeneous models. All models have excellent correlation, as seen in Fig. 3; however, it should be noted that while the homogeneous model maintains its correlation for a long time window, its NRMS value is larger than the heterogeneous models as it overestimates the signal amplitude. The lowest NRMS and the highest goodness-of-fit values are obtained for the model of Schurr et al. (2014).

For Tocopilla, the heterogeneous models are very similar to each other and cannot sufficiently reproduce the arrival time, shape and amplitude of the first peak of the observed data. It can be noticed from Table 2 that the signal needs to be shifted in time to obtain better correlations. The first peak amplitude is better estimated by the homogeneous model which has the highest goodness-of-fit value; however, the first wavelength has a lower NRMS for the heterogeneous models, both with and without the optimal lag applied, as they better reproduce the trough which follows the first peak. Beyond approximately 40 min after the tsunami arrival time, all signals show a high-frequency component that makes comparison between them difficult, as seen by the drastic reduction in the correlation coefficient in Fig. 3. This high frequency could be due to the poorer resolution of the bathymetry data in this bay which is used to construct the finest grid, and therefore we obtain computational resonance. This is seen when we consider the observed signal, which shows no high-frequency component and as such is not a consequence of the geomorphology of the bay.

Talcahuano is the farthest location, and the arrival was around 3 h after the event origin time. In this case, we can observe that the signals are well correlated. Due to the larger wavelength recorded at this distance, the correlation coefficient is still reasonable when small lags are applied to the virtual signals; however, Fig. 3 shows that the heterogeneous models center around zero lag for large correlation window lengths, especially that of Schurr et al. (2014). In terms of amplitude, the first wavelength is accurately reproduced for the model of Schurr et al. (2014), as shown by its low NRMS coefficient for the optimal lag time and a value of 10.0 for the goodness of fit for the first peak.

6 Discussion and conclusion

Despite the fact that the three models can reproduce the tsunami signal, the heterogeneous models provide a better visual and mathematical fit in general. The parameters of the homogeneous source, such as the size and slip, were computed from formulae to correspond to the actual event magnitude, and so the homogeneous slip distribution will overestimate the slip at the rupture edges. This can give a much bigger error in the tsunami models, in terms of overestimating the amplitude and underestimating the arrival time. The overestimation of the slip at shallow depths will produce greater initial sea floor displacements when using the elastic finite fault plane theory of Okada (1985), and furthermore the overestimation of the slip at the northern and southern limits of the rupture will underestimate the arrival time of the tsunami to the north and south. This is shown by the virtual tsunami signal for the homogeneous case, which overestimates the amplitudes, especially in front of the rupture at Iquique, and underestimates the arrival times for the tide gauges to the north and south of the rupture.

Overall, the heterogeneous models, in this study Lay et al. (2014) and Schurr et al. (2014), better reproduce the tsunami shape, amplitude and arrival time. These models are based on a wide range of geophysical observations so can realistically reproduce the tsunami. Between the heterogeneous models, we can observe that the model of Schurr et al. (2014), in general, reaches higher correlations and goodness-of-fit values for the first peak. Lay et al. (2014) used seismological recordings, accompanied by three deep water tsunami wave records, to perform the inversion, while Schurr et al. (2014) used seismological and GPS data. We suggest that the additional constraints introduced by the joint inversion of the GPS deformation field with the seismological data permit a more accurate slip distribution model. The differences between the virtual tsunamis in this study, can be mainly attributed to the slip in the models which is at low depths, as this deforms the seabed more (Okada 1985), and the incorporation of GPS data into the slip model is therefore highly desirable for accurate tsunami simulation. For even more precision, offshore measurements would be helpful to constrain this low-depth slip in the heterogeneous models (Shinohara et al. 2014).

The lack of the sea level decreases before the initial wave was accurately modeled only for the heterogeneous models. This is important for hazard mitigation since a sea-level retraction cannot be used as a reliable indicator for an impeding tsunami arrival, and this study shows that the effect can be caused by more complicated slip models than the simple homogeneous case.

The differences between the modeled and observed tsunamis after the first few waves can be attributed mainly to the bathymetry resolution. Self-oscillation, reflection and refraction in closed bays and on the continental shelf can have a large effect, and the error in the tsunami models propagates forward through the successive time steps so that once a simulation deviates from the observation, the difference is likely to get progressively larger. However, it should be noted that the models are useful for estimating the approximate amplitude and duration of the successive waves, even if the individual peaks of the time series do not all coincide. Given that tsunami modeling is highly dependent on any bathymetric or topographic obstacle that the waves encounter, in order to be able to compare the simulations with the tide gauges after the first few waves, and attribute any differences to the source models, a better resolution bathymetry would be required. Furthermore, more tide gauges along the coast of Chile, situated in places where high-resolution bathymetry is available, would help this study to further differentiate between the source models.

This study shows that while a homogeneous source is useful for modeling tsunamis, the simulation contains a noticeable degree of difference compared to the heterogeneous slip that will eventually occur in the subduction zone. This study uses the heterogeneous slip distribution of a past event, the future challenge is to obtain sufficient data to estimate the degree of locking in the subduction zone and hence the slip distribution, prior to the event, in order to accurately model the tsunami. Recent advances in this area come from GPS-based models to estimate the degree of locking in a subduction zone, which can correlate with slip when the event finally occurs (Moreno et al. 2010).

The April 1, 2014, earthquake used in this study has not liberated all of the strain in this area, and the potential for a future large-magnitude earthquake is possible (Schurr et al. 2014). The challenge remains to model the tsunami which will be produced and its effects on the Chilean coastline.