1 Introduction

Tsunami are a potential hazard to coastal communities all over the world. Several recent large events have increased community and scientific awareness of the need for effective detection, forecasting and emergency preparedness. Probabilistic, geophysical and hydrodynamic models are required to predict the location and likelihood of an event, the initial seafloor deformation and subsequent propagation and inundation caused by the tsunami. Engineering, economic and social vulnerability models can then be used to estimate the impact of the event as well as the effectiveness of hazard mitigation procedures (Arnold and Carlin 2003; Usha et al. 2009). In this paper, we focus on modelling of the physical processes only.

Various approaches are currently used to assess the potential tsunami inundation of coastal communities. These methods differ in both the formulation used to describe the evolution of the tsunami and the numerical methods used to solve the governing equations. The structure of these models ranges from data-driven neural networks (Romano et al. 2009) to non-linear three-dimensional physics-based models (Zhang and Baptista 2008). These models are typically used to predict quantities such as arrival times, wave speeds and heights, as well as inundation extents and heights for developing efficient hazard mitigation plans. Physics-based models combine observed seismic, geodetic and sometimes tide gauge data to provide estimates of initial seafloor and ocean surface deformation. The shallow water wave equations (George and LeVeque 2006), linearised shallow water wave equations (Liu et al. 2009) and Boussinesq-type equations (Weiss et al. 2006) are frequently used to simulate tsunami propagation and inundation.

Inaccuracies in model prediction can result in inappropriate evacuation plans, town zoning and land use planning, which ultimately may result in loss of life and infrastructure damage. Such inaccuracies are caused by unknown distribution of surface roughness, uncertainties in the parameterisation of the source model, discretisation errors, effect of humans structures on flow, as well as uncertainties in the elevation data including effects of erosion and deposition by the tsunami event (Dao and Tkalich 2007; Gica et al. 2007). Consequently, tsunami models must undergo sufficient end-to-end testing to increase scientific and community confidence in the model predictions.

Complete confidence in a model of such a physical system cannot be established. One can only hope to state under what conditions and to what extent the model hypothesis holds true. Specifically, the utility of a model can be assessed through a process of verification and validation. Verification assesses the accuracy of the numerical method used to solve the governing equations and validation is used to investigate whether the model adequately represents the physical system (Bates and Anderson 2001). Together these processes can be used to establish the likelihood that a model represents a legitimate hypothesis.

The sources of data used to verify and validate a model can be separated into three main categories: analytical solutions, scale experiments and field measurements. Analytical solutions of the governing equations of a model, if available, provide the best means of verifying any numerical model. However, analytical solutions are frequently limited to a small set of idealised examples that do not completely capture the more complex behaviour of ‘real’ events. Scale experiments, typically in the form of wave-tank experiments, provide a much more realistic source of data that better captures the complex dynamics of flows such as those generated by a tsunami, whilst allowing control of the event and much easier and accurate measurement of the hydrodynamic properties. Comparison of numerical predictions with field data provides the most stringent test. The use of field data increases the generality and significance of conclusions made regarding model utility (Bates and Anderson 2001).

Currently, the extent of tsunami-related field data is limited. The cost of tsunami monitoring programs and bathymetry and topography surveys as well as the rarity of events prohibit the collection of data in many of the regions in which tsunamis pose the greatest threat. Even if the location and time of a tsunami event could be predicted ahead of time, the hugely destructive force of the tsunami and ephemeral nature of relevant field data limit the amount and accuracy of data that can be collected. The resulting lack of data has limited the number of field data sets available to validate tsunami models.

Synolakis et al. (2008) have developed a set of standards, criteria and procedures for evaluating numerical models of tsunamis. They propose a number of analytical solutions to assess model veracity, five scale comparisons (wave-tank benchmarks) and two field events to assess the validity of a model.

The first field data benchmark introduced in Synolakis et al. (2008) compares model results against observed data from the Hokkaido-Nansei-Oki tsunami that occurred around Okushiri Island, Japan on the 12 July 1993. This tsunami provides an example of extreme run-up generated from reflections and constructive interference resulting from local topography and bathymetry. The benchmark consists of two tide gauge records and numerous spatially distributed point sites at which modelled maximum run-up elevations can be compared. The second benchmark is based upon the Rat Islands tsunami that occurred off the coast of Alaska on the 17 November 2003. The Rat Island tsunami provides a good test for real-time forecasting models since the tsunami was recorded at three tsunameters. The test requires matching the tsunami propagation model output with the tsunameter recordings to constrain the tsunami source model and then using it to reproduce the tide gauge record at Hilo, Hawaii.

In this paper, we develop a field test to be used in conjunction with the aforementioned tests to validate tsunami models. The benchmark proposed here focuses on validating the inundation component of tsunami models. A detailed inundation survey of Patong City, Thailand in conjunction with eyewitness accounts of onshore flow depths and velocities, the arrival times and number of tsunami crests are used to compare model and observed inundation. A description of the data required to construct the test is given in Section 2.

Previous model field evaluations (Ioualalen et al. 2007; Watts et al. 2005) and benchmarks (Synolakis et al. 2008) have focused on reproducing inundation at point sites, which are often sparsely distributed. The stakeholders in any tsunami study of inundation, such as emergency planners, are generally more interested in more detailed localised studies of tsunami impacts on populated areas. Informed and defensible decisions must be based upon detailed simulations that predict local inundation extents, onshore flow velocities and depths. Ideally validation studies should be tailored accordingly.

Unlike the existing field benchmarks, the proposed test facilitates localised and highly detailed spatially distributed assessment of modelled inundation. To the authors knowledge, it is also the first benchmark to assess model inundation influenced by numerous human made structures. Additionally, eyewitness videos have been considered to allow the qualitative assessment of onshore flow patterns.

Although the benchmark proposed here focuses on validating inundation models, we note that the generation and propagation components of tsunami models are also important. Without accurate models of propagation and inundation, defensible predictions of inundation would be impossible. As with any validation test, the following must be used in conjunction with other appropriate tests to ensure conclusions drawn about the model utility are defensible.

An associated aim of this paper is to illustrate the use of this new field test to validate the three-step modelling methodology employed by Geoscience Australia to model tsunami inundation. A description of the model components is provided in Section 3, and the validation results are given in Section 4. In Section 4.5, the effect of the presence of buildings on inundation and onshore flow patterns is investigated.

The numerical models used to simulate tsunami impact are computationally intensive, and high-resolution models of the entire evolution process will often require a number of days to complete. Consequently, the uncertainty in model predictions is difficult to quantify as it would require a very large number of time consuming simulations. However, model uncertainty should not be ignored. Section 4.6 provides a discussion of the main sources of uncertainty that affect model predictions.

2 Data

The sheer magnitude of the 2004 Sumatra–Andaman earthquake and the devastation caused by the subsequent tsunami have generated much scientific interest. As a result, an unusually large amount of post seismic data has been collected and documented. Data sets from seismometers, tide gauges, gps surveys, satellite overpasses, subsequent coastal field surveys of run-up and flooding and measurements of coseismic displacements as well as bathymetry from ship-based expeditions and high-quality topographic data have now been made available.

The evolution of a tsunami consists of three stages: generation, propagation and inundation. Although this benchmark focuses on validating model inundation, consideration of the other two stages is also important. Consequently, in this section, we present data not only to facilitate validation of inundation extent but to also aid the assessment of tsunami generation and propagation. In an attempt to provide higher visibility and easier accessibility for tsunami benchmark problems, the data used to implement the proposed validation are freely available at http://tinyurl.com/patong2004-data. The published components are:

  • Bathymetric data covering the Andaman Sea and Patong Bay

  • Elevation data covering Patong Bay and its immediate surroundings as well as Patong City including the majority of buildings in the inundation zone

  • Survey of the maximum run-up in Patong City

  • JASON satellite sea level anomaly data

Estimates of onshore wave speeds and depths extracted from eyewitness videos and geodetic measurements of vertical displacement at sites surrounding the rupture zone are presented below.

2.1 Bathymetry data

The bathymetry data used in this study were derived from the following sources:

These data sets were used to produce four nested grids, used in this study. These grids are shown in Fig. 1. The nested approach was chosen to match model resolution requirements according to the principle that shallow water flows are more sensitive to variations in elevation data than deep water flows. Consequently, the elevation data in shallow waters and onshore need to be better resolved than elevation data further off-shore. The four nested grids used here were derived as follows:

  • Twenty-seven-arc second grid obtained by interpolating the 2-min DBDB2 grid. This is the coarsest grid used in the simulations

  • Nine-arc second grid generated by sub-sampling the 3-arc sec grid from NOAA

  • Three-arc second grid formed as a subset of the 3-s grid from NOAA

  • One-arc second grid created by digitising Thai Navy bathymetry chart no. 358 followed by a gridding procedure as described below. This grid is the smallest and covers the Patong Bay area and immediately adjacent regions. The digitised points and contour lines from this chart are shown in Fig. 2

Fig. 1
figure 1

Nested elevation grids of the Andaman Sea with the highest resolution at and around Patong Bay

Fig. 2
figure 2

3D view of the elevation data set used for the nearshore propagation and inundation in Patong City showing digitised data points and contours as well as rivers and roads draped over the data model

The gridding process for the finest grid was performed using Intrepid, a commercial geophysical processing package developed by Intrepid Geophysics.Footnote 1 Any points that deviated from the general trend near the boundary were deleted through a quality control process. The sub-sampling of larger grids was performed by using resample, a Generic Mapping Tools program (Wessel and Smith 1998).

To the authors’ knowledge, the bathymetry data documented here are the best available. It is extremely difficult to estimate uncertainty in bathymetric data, and consequently, we cannot include a discussion of the errors in these data. Here, we direct the reader to a discussion of the consistency of the DBDB2 data set, one of the sources used in this paper, with several other publicly available data sets (Marks and Smith 2006).

2.2 Topography data

A 1-s grid comprising the onshore topography and the nearshore bathymetry for Patong Beach was created from the Thai Navy charts (described in Section 2.1) and from 1- to 10-m elevation contours provided by the Coordinating Committee for Geoscience Programmes in East and Southeast Asia (CCOP). The 1-s terrain model for the community is shown in Fig. 2.

To provide increased resolution for the surveyed area, two one third-second grids were created: one for the saddle point covering Merlin and Tri Trang Beaches (separate survey patch to the left in Fig. 3) and one for Patong City and its immediate shore area (main surveyed area in Fig. 3). These grids were based on the same data used for the 1-s data grid. The Patong City grid was further modified based on satellite imagery to include the river and lakes towards the south of Patong City which were not part of the provided elevation data. In the absence of data, the depth of the river and lake system was set uniformly to a depth of 1 m.

Fig. 3
figure 3

Tsunami survey mapping the maximum observed inundation at Patong City, courtesy of the CCOP (Szczucinski et al. 2006)

Again we are unable to estimate the uncertainty in this topography data. The bathymetry data presented here are the best available.

2.2.1 Buildings and other structures

Human-made buildings and structures can significantly affect tsunami inundation. The footprint and number of floors of the buildings in Patong City were extracted from the data provided by CCOP. The heights of these buildings were estimated assuming that each floor has a height of 3 m and the resulting profiles were added to the topographic data set within the inundation model.

2.3 Inundation survey

Tsunami run-up in built-up areas can be the cause of large financial and human losses, yet run-up data that can be used to validate model run-up predictions is scarce because such events are relatively infrequent. Of the two field benchmarks proposed in Synolakis et al. (2008), only the Okushiri benchmark facilitates comparison between modelled and observed run-up. One of the major strengths of the benchmark proposed here is that modelled run-up can be compared to an inundation survey which maps the maximum run-up along an entire coastline rather than at a series of discrete sites. The survey map was obtained from the CCOP and is shown in Fig. 3.

The survey plots the maximum run-up of the 2004 Indian Ocean tsunami in Patong City. Here, we define the maximum run-up as the difference between the elevation of the maximum tsunami penetration and the elevation of the shoreline during the period between the arrival of the first and last of the three main waves. The run-up ranged between 2 m in the river valley in the south and 6.7 m on parts of the steep coast line. According to Szczucinski et al. (2006), the average run-up in Patong town was approximately 3 m and approximately 4.5 m on the southern shore of Patong Bay. The maximum horizontal intrusion of the tsunami waters from the shoreline ranged between 340 and 560 m.

Traces left by the tsunami, such as debris lines, water marks or damage to vegetation caused by salt water were used to estimate the maximum run-up. Due to the nature of the survey, the authors believe the errors in the measurements to be at least ±30 cm.

The elevation of Patong Bay is reasonably varied, possessing some regions with sharp slopes and other flatter coastal plain areas. However, the topography in inundation region and the adjacent areas is generally quite flat. Szczucinski et al. (2006) investigated the relationship between the slope of the local topography and the maximum run-up and the maximum horizontal intrusion in Patong Bay. Szczucinski et al. (2006) found that there is no direct correlation between maximum run-up heights and the slope of the local topography where the measurements were taken. Moreover the greatest distances of horizontal intrusion were achieved on the smaller inclines. The maximum horizontal intrusion distances reached almost 1,500 m in some locations.

2.4 Eyewitness accounts

Eyewitness accounts detailed in Papadopoulos et al. (2006) report that many people at Patong Beach observed an initial retreat (trough or draw down) of the shoreline of more than 100 m followed a few minutes later by a strong wave (crest). Eyewitness statements place the arrival time of the first wave between 9:55 am and 10:05 am local time or about 2 h after the source rupture. Two waves of smaller amplitudes arrived 12–15 min after the preceding wave.

A 10-min window on the arrival time of the first wave is not a stringent test of model performance. However, matching the arrival time within the estimated uncertainty can be considered a necessary condition which must be met. If the predicted arrival time is outside the bounds given, it would suggest a serious deficiency in the modelling process.

Two videos were sourcedFootnote 2 both of which include footage of the tsunami in Patong City on the day of the 2004 Indian Ocean tsunami. Both videos show an already inundated street. They also show what is to be assumed as the second and third waves approaching and further flooding of the buildings and street. The first video is in the very north, filmed from what is believed to be the roof of the Novotel Hotel (7.906192° N, 98.29562° E).

The second video is in the very south, filmed from the second story of a building next door to the Comfort Resort near the corner of Ruam Chai St and Thaweewong Road (7.888291° N, 98.292440° E). Figure 4 shows stills from this video. Both videos were used to estimate flow speeds and inundation depths over time.

Fig. 4
figure 4

Four frames from a video where flow rate could be estimated; circle indicates tracked debris, from top left: 0.0, 5.0, 7.1 and 7.6 s

Flow rates were estimated using landmarks found in both videos and were found to be in the range of 6 m/s (±3 m/s) in the north and 2 m/s (±1.5 m/s) in the south.Footnote 3 Water depths could also be estimated from the videos by the level at which water rose up the sides of buildings such as shops. Our estimates are on the order of 1 to 2.5 m. Without an on-site survey of the locations at which the videos were taken, it is difficult to reduce the large estimates of uncertainty in the flow depths and speeds.

2.5 Geodetic data

The 2004 Sumatra–Andaman tsunami was generated by a coseismic displacement of the seafloor resulting from one of the largest earthquakes on record. The mega-thrust earthquake started on the 26 December 2004 at 0h58′53′′ UTC (or just before 8 am local time) approximately 70 km offshore of North Sumatra (http://earthquake.usgs.gov/eqcenter/eqinthenews/2004/usslav). The rupture propagated 1,000–1,300 km along the Sumatra–Andaman trench to the north at a rate of 2.5–3 km s − 1 and lasted approximately 8–10 min (Ammon et al. 2005). Estimates of the moment magnitude of this event range from about 9.1 to 9.3 M w (Chlieh et al. 2007; Stein and Okal 2007).

The unusually large surface deformation caused by this earthquake means that a range of different geodetic measurements of the surface deformation is available. Here, we use the near-field estimates of vertical deformation in Northern Sumatra (Subarya et al. 2006) and the Nicobar–Andaman Islands (Gahalaut et al. 2006), coral measurements from Simeulue Island (Subarya et al. 2006), field observations in the Nicobar–Andaman Islands (Bilham et al. 2005) and position of the pivot line determined from satellite imagery (Meltzner et al. 2006) collated by Chlieh et al. (2007), to assess whether our crustal deformation model of the 2004 Sumatra–Andaman earthquake is producing reasonable results. These data are displayed in Tables 3, 4, 5, 6 and 7 in Appendix 1, respectively.

Note that the geodetic data used here are a combination of the vertical deformation that happened in the ∼10 min of the earthquake plus the deformation that followed in the days following the earthquake before each particular measurement was actually made (typically of the order of days). Therefore, some of the observations may not contain the purely co-seismic deformation but could include some post-seismic deformation as well (Chlieh et al. 2007). The final column of each of the geodetic data tables displays the estimated measurement uncertainty in the deformation. It is impossible to estimate what percentage of the deformation was caused by the initial earthquake and the proportion caused by subsequent after-shocks and shifts.

2.6 JASON satellite altimetry

During the 2004 Sumatra–Andaman event, the JASON satellite tracked from north to south and over the equator at 02:55 UTC nearly 2 h after the earthquake (Gower 2005). The satellite recorded the sea level anomaly compared to the average sea level from its previous five passes over the same region in the 20–30 days prior.

2.7 Validation check-list

In this section, we have presented the corresponding data necessary to implement the proposed field validation test. The main focus of the study is to assess the ability to predict inundation extent. Attention is also given to reproducing onshore flow velocities and depths; however, obtaining field data measuring these quantities is often very difficult and subject to very large uncertainties, as is the case here. Consequently, the ‘fit’ of observed and modelled run-up, in the proposed benchmark, has the greatest influence on conclusions regarding model validity. With this in mind, we propose that a legitimate tsunami model should reproduce the following behaviour:

  • The inundation survey map in Patong City (Fig. 3)

  • A leading depression followed by two distinct crests of decreasing magnitude at the beach

  • The arrival times of each wave within 5 min of the estimated values

  • Predict the water depths and flow speeds, at the locations of the eyewitness videos, that fall within the bounds obtained from the videos

  • The jason satellite altimetry sea surface anomalies (see Section 2.6)

  • The vertical deformation observed in northwestern Sumatra and along the Nicobar–Andaman Islands (see Section 2.5)

Ideally, the model should also be compared to measured time series of offshore wave heights and flow speeds, but the authors are not aware of the availability of such data near Patong Bay. Here, we note that the jason satellite altimetry sea surface anomalies and the vertical deformation data have been used in previous inversion studies (Chlieh et al. 2007; Gahalaut et al. 2006; Grilli et al. 2006) to estimate the characteristics of the Sumatra–Andaman earthquake. If a model uses a description of the source obtained from such an inversion, the comparison of predicted deformation and anomalies with the observed data is a circular argument and thus should be neglected.

3 Modelling the event

Numerous models are currently used to simulate tsunami generation, propagation and inundation (Titov and Gonzalez 1997; Satake 1995; Zhang and Baptista 2008). Here, we introduce the modelling methodology employed by Geoscience Australia to illustrate the utility of the proposed validation test.

The methodology used by Geoscience Australia has three distinct components. Firstly, an appropriate model is used to approximate the initial sea surface deformation. This model is chosen according to the cause of the initial disturbance. The second step involves propagating the resulting wave in the deep ocean using the hydrodynamic tool URSGA which is well suited to modelling tsunamis over large distances. The third and final step consists of using ANUGA (Nielsen et al. 2005) to model the complex process of inundation. This three-part methodology roughly follows the three stages of tsunami evolution: generation, propagation and inundation.

The use of two hydrodynamic models is intended to enhance the modelling process by using models which are individually well suited to one of the two distinct phases of tsunami propagation and inundation. URSGA is well suited to modelling tsunamis over large distances, but its utility decreases with water depth unless an intricate sequence of nested grids is employed. In comparison, ANUGA is designed to accurately model the complex wetting and drying process of inundation. A description of the URSGA and ANUGA models is provided in Appendix 2. This section presents numeric and computational considerations needed to validate the modelling practice of Geoscience Australia against the new proposed validation test.Footnote 4

3.1 Modelling deformation

We use the method of Wang et al. (2003) to model seafloor deformation. The chosen method predicts crustal deformation with a model that describes the variation in elastic properties with depth. A slip model of the earthquake to describe the dislocation is required. The elastic parameters used for this study are the same as those in Table 2 of Burbidge et al. (2008). The source parameters of the slip model used here to simulate the 2004 Indian Ocean tsunami were taken from the slip model G-M9.15 of Chlieh et al. (2007). This model was created by inversion of wide range of geodetic and seismic data. The slip model consists of 686 20 × 20 km sub-segments each with a different slip, strike and dip angle. The dip sub-faults range from 17.5° in the north and 12° in the south. Refer to Chlieh et al. (2007) for a detailed discussion of this model and its derivation.

3.2 Modelling propagation and inundation

The deformation model described here was used to provide a profile of the initial ocean surface displacement. This wave was used as an initial condition for URSGA and was propagated throughout the Bay of Bengal. The rectangular computational domain of the largest grid extended from 90° to 100° E and 0° to 15° N and contained 1,335 × 1,996 finite difference points. Inside this grid, a nested sequence of grids was used. The grid resolution of the nested grids went from 27 arc sec in the coarsest grid down to 9 arc sec in the second grid and 3 arc sec in the third grid. The computational domain is shown in Fig. 5.

Fig. 5
figure 5

Computational domain of the URSGA simulation (inset: white and black squares and main: black square) and the ANUGA simulation (main and inset: red polygon)

After propagating the tsunami in the open ocean using URSGA, the approximated ocean surface elevation and horizontal flow speeds were extracted and used to construct a boundary condition for the ANUGA model. The interface between the URSGA and ANUGA models was chosen to roughly follow the 100-m-depth contour along the west coast of Phuket Island. Data from the 3-s grid which are approximately 30 m apart was decimated to match the resolution chosen in ANUGA. The computational domain is shown in Fig. 5.

The west most boundary of the ANUGA domain (red line) was used as the interface between the URSGA and ANUGA models. Transmissive boundary conditions were used on all other boundary segments. We note that the domain utilised for the ANUGA simulation only captures the arrival of wave fronts that propagate directly from the source. Given the local shape of the coastline, the assumption was made that the effects of any reflections on the study region are negligible.

The domain was discretised into 386,338 triangles. The resolution of the grid was increased in regions inside the bay and onshore to efficiently increase the simulation accuracy for the impact area. The grid resolution ranged between a maximum triangle area of 1 × 105 m2 (corresponding to approximately 440 m between mesh nodes) near the western ocean boundary (roughly following the 100-m-depth contour) to 20 m2 (corresponding to approximately 6 m between mesh nodes) in the small regions surrounding the inundation region in Patong City and intertidal zone. The coarse resolution was chosen to balance accuracy with computational costs whilst the fine resolution was chosen to match the available resolution of topographic data and building data in Patong City. Figure 6 shows a section of the mesh covering the southern part of the City.

Fig. 6
figure 6

Section of the mesh used by ANUGA to simulate the tsunami inundation. The finest mesh resolution is approximately 6 m between nodes which is sufficient to resolve individual buildings affecting the flows

Due to a lack of available roughness data, Manning friction was set to a constant throughout the computational domain. For the reference simulation, a Manning’s coefficient of 0.01 was chosen to represent a small resistance to the water flow (Dao and Tkalich 2007; Linsley and Franzini 1979).

The URSGA model in this study was used to compute the incident wave along the 100-m contour line only. There is no such information available at each side of the ANUGA domain towards the south and the north. Consequently, a transmissive boundary condition was chosen for these segments, effectively replicating the time-dependent wave height present just inside the computational domain. The velocity field on these boundaries was kept at zero during the simulation. Other choices include applying the mean tide value as a Dirichlet boundary condition. Experiments as well as the result of the verification reported here showed that this approach tends to underestimate the tsunami impact due to the tempering of the wave near the side boundaries, whereas the transmissive boundary condition robustly preserves the wave.

During the ANUGA simulation, the tide was kept constant in the offshore region at 0.80 m. This value was chosen to correspond to the tidal height specified by the Thai Navy tide charts (http://www.navy.mi.th/hydro/) at the time the tsunami arrived at Patong Bay. Although the tsunami propagated for approximately 3 h before it reached Patong Bay, the period of time during which the wave propagated through the ANUGA domain is much smaller, by the order of 2 h. The tide varied ±15 cm from the fixed tide level used over the duration of the simulation. ANUGA does not have the ability to simulate tidal fluctuations. Although not ideal, we believe that this change in water level would have little effect on the inundation map. The initial water level for the river was set to 0 m.

4 Results

This section presents a validation of the modelling practice of Geoscience Australia against the new proposed validation test. The criteria outlined in Section 2.7 are addressed.

4.1 Inundation

The ANUGA simulation described in the previous section and used to model shallow water propagation also predicts inundation. Maximum onshore inundation depth was computed from the inundation model and used to generate a measure of the inundated area. Figure 7 (left) shows the observed and modelled inundation. In this figure, a point in the computational domain was deemed inundated if at some point in time it was covered by at least 1 cm of water. The blue regions indicate the areas where the predicted inundation matched the inundation survey. The red regions indicate areas where inundation was predicted but not observed, and the yellow regions correspond to areas of observed inundation that were not predicted. The yellow regions are false positives. The inundation map only depicts inundation extent and assumes that all the land between the line of maximum inundation and the ocean was inundated. Typically, the elevation of the yellow regions is higher than the surrounds and were not inundated.

Fig. 7
figure 7

Simulated inundation versus observed inundation using an inundation threshold of 1 cm (left) and 10 cm (right). Blue regions indicate the areas where the predicted inundation matched the inundation survey. Red regions indicate areas where inundation was predicted but not observed, and yellow regions correspond to areas of observed inundation that were not predicted

Figure 7 (left) shows very good agreement between the measured and simulated inundation. However, these results are dependent on the classification used to determine whether a region in the numerical simulation was inundated. The precision of the inundation boundary generated by the on-site survey is most likely less than the 1-cm tolerance used in Fig. 7 (left), as it was determined by observing watermarks and other signs left by the receding waters. Consequently, the measurement error along the inundation boundary of the survey is likely to vary significantly and somewhat unpredictably. An inundation threshold of 10 cm therefore was selected for inundation extents reported in this paper to reflect the more likely accuracy of the survey and subsequently facilitate a more appropriate comparison between the modelled and observed inundation area. Figure 7 (right) shows the simulated inundation using a larger threshold of 10 cm. A comparison to Fig. 7 (left) shows that the model predictions are not overly sensitive to the threshold used. An animation of this simulation is available on the ANUGA website at https://datamining.anu.edu.au/anuga or directly from http://tinyurl.com/patong2004.

4.1.1 Comparison to survey

To quantify the agreement between the observed and simulated inundation, we introduce the measure

$$ p_\mathrm{in}=\frac{A(I_\mathrm{o})-A(I_o\cap I_\mathrm{m})}{A(I_\mathrm{o})} $$

which represents the proportion of the area of the observed inundation region I o not captured by the inundation region I m predicted by the model. Another useful measure is the fraction of the modelled inundation area that falls outside the observed inundation area given by the formula

$$ p_\mathrm{out}=\frac{A(I_\mathrm{m})-A(I_\mathrm{m}\cap I_\mathrm{o})}{A(I_\mathrm{o})} $$

These values, for the two aforementioned simulations, are given in Table 1. High values of ρ in and ρ out indicate that the model is performing poorly, whereas lower values of both quantities would indicate stronger agreement between the model and the inundation survey. Values of zero for both these quantities correspond to a perfect agreement.

Table 1 ρin and ρout of the reference simulation and all sensitivity studies

4.2 Eyewitness accounts

The arrival time of the first wave took place between 9:55 and 10:05 as described in Section 2.4. The modelled arrival time at the beach is around 10:02 as can be verified from the animation provided in Section 4.1 or from Fig. 8 below. Subsequent waves of variable magnitude appear over the next 2 h at approximately 10:20 and 10:47. The predicted arrival times of the first and second wave are consistent with the eyewitness accounts. However, the modelled arrival of the third wave is approximately 10 min behind the observed arrival time. Chlieh et al. (2007) were also unable to obtain an accurate representation of the magnitude and timing of the third wave.

Fig. 8
figure 8

Time series obtained from the two onshore locations, north and south. Time is given in hours since the earthquake event (7:59)

ANUGA can also be used to estimate onshore depths and velocities. The time series of depth and speed at the two sites at which the eyewitness videos were taken are shown in Fig. 8. The estimated depths and flow rates given in Section 2.4 are shown together with the modelled depths and flow rates obtained from the model in Table 2. The predicted maximum depths and speeds are all of the same order of what was observed, as is the approximate arrival time at the two locations. The higher velocities observed at the northern site are higher because of its close proximity to the coast. Furthermore, unlike the southern site, the path of the wave from the ocean to the observation site is largely unimpeded.

Table 2 Observed depth and flows from the video footage compared to values extracted from the inundation model

Note that unlike the real event, the model estimates complete withdrawal of the water between waves at the chosen locations and shows that the model must be used with caution at this level of detail. Nonetheless, this comparison serves to check that the peak depths and speeds predicted are within the range of what is expected. The two eyewitness videos were taken in streets surrounded by buildings. The width of these streets was commensurate with the smallest triangles in the computational mesh. It is likely that a higher resolution mesh is needed to more accurately capture flow speeds, depths and the persistence of water between waves. However, a refinement of the mesh should be accompanied by higher resolution estimates of topography which are not available.

4.3 JASON satellite altimetry

Figure 9 provides a comparison of the URSGA-predicted sea surface elevation with the JASON satellite altimetry data. The figure provides a comparison of the sea surface level anomaly observed by the Jason satellite (data points) and predicted by the URSGA model (solid line) at a number of locations (uniquely determined by the latitude of the point). The graph is not a snapshot of the ocean surface anomalies at one instance in time but rather at a series of times at which the satellite’s recording device was positioned at the latitudes given on the horizontal axis of the figure. The recordings were taken over a short period between 112 and 124 min after the initial disturbance.

Fig. 9
figure 9

Comparison of the URSGA-predicted surface elevation with the JASON satellite altimetry data. The URSGA wave heights have been corrected for the time the satellite passed overhead compared to JASON sea level anomaly

The URSGA model replicates the amplitude and timing of the wave observed at 2.5° S but underestimates the amplitude of the wave further to the south at 4° S. In the model, the southern most of these two waves appears only as a small bump in the cross section of the model (shown in Fig. 9) instead of being a distinct peak as can be seen in the satellite data. Also note that the URSGA model prediction of the ocean surface elevation becomes out of phase with the JASON data at 3° to 7° N latitude. Chlieh et al. (2007) also observed these misfits and suggested that it is caused by a reflected wave from the Aceh Peninsula that is not resolved in the model due to insufficient resolution of the computational mesh and bathymetry data. This is also a limitation of the model presented here which could be improved by nesting grids near Aceh. Given the local shape of the coastline, the assumption was made that the effects of any reflections on the inundation in the study region are negligible.

4.4 Initial disturbance

The location and magnitude of the seafloor displacement associated with the 2004 Sumatra–Andaman tsunami was calculated using the slip parameters of the G-M9.15 model of Chlieh et al. (2007); refer to Section 3.1. The slip parameters of this model were derived through an inversion process based upon the vertical deformation data provided in Section 2. Consequently, a comparison of predicted deformation and anomalies with the observed data would be a circular argument. The comparison provided here is intended only for illustrative purposes.

To quantify the difference between the predicted and observed displacement, we use the reduced chi-squared fit

$$ \chi^2=\frac{1}{N} \sum\limits_{i=1}^N \left(\frac{d_{\mathrm{obs},i}-d_{\mathrm{pred},i}} {\sigma_i} \right)^2 $$

where d obs,i is the observed vertical displacement, d pred,i is the predicted observed vertical displacement, σ i is the associated 1 − σ uncertainty and N is the number of observation sites. Using all the data in Tables 3, 4, 5, 6 and 7 in Appendix 1, we obtain χ 2 = 57.8 At first, this fit seems poor; however, Chlieh et al. (2007) note that the uncertainties presented in Table 4 assigned by Gahalaut et al. (2006) do not attempt to account for post-seismic deformation and thus are too conservative. To adjust for post-seismic deformation in Table 4, Chlieh et al. (2007) suggest rescaling the uncertainties by a factor of 15. Adopting this approach, the χ 2 fit can be reduced to 14.5. This value is consistent with the fit reported by Chlieh et al. (2007).

Table 3 Estimated vertical displacements in Northern Sumatra (Subarya et al. 2006)
Table 4 Estimated vertical displacements and uncertainties (centimetres) of campaign GPS station in the Nicobar–Andaman Islands from Gahalaut et al. (2006)
Table 5 Estimated vertical displacements and uncertainties (centimetres) derived from coral measurements on Simeulue Island (Subarya et al. 2006)
Table 6 Estimated vertical displacements and uncertainties (centimetres) derived from field observations in the Nicobar–Andaman Islands (Bilham et al. 2005)
Table 7 Position of the pivot line and uncertainties (centimetres) determined from satellite imagery (Meltzner et al. 2006)

Figure 10 shows the predicted vertical component of the coseismic crustal deformation calculated for the earthquake compared to the observed vertical deformation. Many of these measurements were taken up to a month after the earthquake. Consequently, the measurements are a combination of the vertical motion caused by the earthquake plus any vertical motion caused by the post-seismic deformation that occurred after the earthquake. The observations also include any vertical motion that happened between the time of the initial GPS measurement and the earthquake itself (up to 9 months in some cases). Such pre- and post-seismic motion has the potential to be a significant proportion of the total deformation for some sites (Chlieh et al. 2007). The deformation model used here only calculates the coseismic ‘instantaneous’ (i.e. first few minutes) deformation since this is what causes the tsunami. It does not include the pre- or post-seismic motion in the weeks or months before or after the earthquake. Therefore, we can expect some potentially large differences between the calculated instantaneous vertical motion and observed total vertical motion for some sites, particularly those located in regions which had large amounts of post-seismic deformation.

Fig. 10
figure 10

Magnitude of the modelled vertical component of the seafloor displacement (crosses) compared with observed deformation (circle). The error bars indicate the 1 − σ measurement uncertainty. The uncertainties given do not include post-seismic deformation uncertainties

Despite this limitation, the average difference between the observed motion and the predicted motion (including the pivot line points) was only 0.06 m. However, specific points in the model could differ from the total vertical displacement observed by 1 m or more (see Fig. 10). This is consistent with other studies, like Chlieh et al. (2007). The mostly likely explanation is that these specific sites had a large amount of postseismic deformation, but other factors (e.g. tectonic deformation before or after the earthquake, model or measurement error) may also play a part for specific points. Overall, we would argue, however, that the fit of the model to the observations is still satisfactory since the average misfit is small when compared to the large uncertainties surrounding how much of the observed motion in the data sets is coseismic. Note that sites displayed in rows 1, 3, 9, 10, 12, 13 of Table 3 of Appendix 1 were not used to calculate the χ 2 error nor are they shown in Fig. 10, as they fell outside the computational domain.

4.5 Assessing the impacts of buildings on inundation

There have been numerous studies into the effects of tsunamis on human structures (Lukkunaprasit et al. 2009; Ramsden 1996). However, less attention has been given to the effect of buildings on inundation extent. Run-up is often predicted without consideration of the effect of structures and using coarse topography data (Grilli et al. 2006; Ioualalen et al. 2007; Watts et al. 2005).

Some attempts have been made to capture the effect of structures on inundation by simply increasing the bed roughness (Imamura 2009). Another approach is to include the structures directly in the topographic model. This section investigates the effect of the presence of buildings in the elevation data set on model maximum inundation as computed by ANUGA. The reference model is the one reported in Fig. 7 (right) with a friction coefficient of 0.01, buildings included and the boundary condition produced by the URSGA model. Buildings were included by increasing the elevation of all the computational cells containing structures by the associated height of the structure. ANUGA currently cannot model discontinuous topography. Consequently, once all the cells are raised, a smoothing algorithm is applied to ensure any discontinuities are removed. Around buildings the computational mesh was made as small as possible to limit the number of triangles with small slopes.

Figure 11 shows the maximum inundation extent when the presence or absence of physical buildings is included in the elevation. From Table 1, it is apparent that densely built-up areas act as dissipators greatly reducing the inundated area. Figure 12 shows the associated flow speeds in the presence and absence of buildings (bare earth). It is evident that flow speeds tend to increase in passages between buildings but slow down in areas behind them as compared to the bare earth scenario. Figure 13 shows the associated flow depths in the presence and absence of buildings. The total volume of water onshore at the time of maximum inundation increases if buildings are removed from the elevation data set. Without buildings, the total volume of water onshore is 217,546 m3. With buildings, the total onshore volume is 168,509 m3. Note that the total area covered by the buildings in the observed inundation region is 408,523 m2. The buildings appear to act as a barrier that deflect some of the incoming volume of water back offshore.

Fig. 11
figure 11

Model results show the effect of buildings in the elevation data set. The left-hand image shows the inundation extent as modelled in the reference model which includes buildings in the elevation data. The right hand image shows the result for a bare earth model, i.e. entirely without buildings

Fig. 12
figure 12

The maximal flow speeds for the same model parameterisations found in Fig. 11. As expected, the presence of buildings reduces the flow speeds behind them but tends to increase speeds in passages between buildings

Fig. 13
figure 13

The maximal flow depths for the same model parameterisations found in Fig. 11

These results suggest that, when possible, the presence of human-made structures should be included into the model topography. Simply matching point sites with much lower resolution meshes or, indeed, areas of artificially high friction than used here is an over simplification. Such simulations cannot capture the fine detail that so clearly affects inundation depth, flow speeds and extent.

4.6 Additional influences on inundation

Discrepancies between the survey data and the modelled inundation arise from errors and uncertainties in both the field surveys and the models. The former includes measurement errors in the GPS survey recordings and missing data in the field survey data itself. The latter include unknown distribution of surface roughness, uncertainties in the parameterisation of the source model, discretisation errors, effect of humans structures on flow, as well as uncertainties in the elevation data including effects of erosion and deposition by the tsunami event.

The results we have documented so far are dependent on the value of the following model parameters: Manning’s friction coefficient, the elevation at which the tide is fixed and the slip parameters that characterise the source. The model results are also dependent on the model structure employed, i.e. on the assumption that the effects of Coriolis force and dispersion can be neglected and the location and type of boundary conditions. These assumptions and parameters have varying effects on the predicted inundation extent and onshore flow patterns. Here, we note that the choices made here and specified in Section 3 are based upon the best available information and are representative of the authors current modelling purposes and practices. A comprehensive documentation of model uncertainties and sensitivities demands an extensive investigation which is beyond the scope of this paper. Our main aim is to provide a spatially distributed validation test which can be used to analyse the performance of tsunami inundation models. Nevertheless, here we provide a brief discussion on the factors which may influence the accuracy of predicted inundation.

The estimation of the parameters characterising the source is often regarded as the most difficult task of tsunami modelling and is frequently subject to large errors (Synolakis et al. 2008). This is particularly true of real-time modelling (Gica et al. 2007). Epicenter location, rake, dip and strike angles, fault length and width, slip displacement and focal depth all have varying influence on predictions. The inference of these parameters requires large amounts of information.

The choice of Manning’s friction can also have a large effect on inundation. According to Linsley and Franzini (1979), appropriate values of Manning’s coefficient range from 0.01 to 0.06 for tsunami propagation over a sandy seafloor and the reference model uses a value of 0.01. The smaller value is representative of flow over surfaces like neat cement and smooth metal, whilst the larger value corresponds to flow in very rough channels. To investigate sensitivity to this parameter, we simulated the maximum onshore inundation using a Manning’s coefficient of 0.0003 and 0.03. The higher friction value is associated with flow in areas with stones and weeds. A similar range was explored by Myers and Baptista (2001) for the Hokkaido Nansei-Oki tsunami. The resulting inundation maps are shown in Fig. 14 and the maximum flow speeds in Fig. 15. The figure, along with Table 1, shows, as expected, that the onshore inundation extent decreases with increasing friction. Furthermore, despite the large change in friction, only small changes in the inundation extent are seen. This is consistent with the conclusions of Synolakis et al. (2005), who state that the long wavelength of tsunami tends to mean that friction is less important in comparison to the motion of the wave. A spatially variable representation of surface roughness may improve the agreement between the modelled and observed inundation extent, but such an approach requires a detailed survey of Patong City which we were unable to obtain.

Fig. 14
figure 14

Model results for different values of Manning’s friction coefficient shown to assess sensitivities. The left and right images show the inundation results for friction values of 0.0003 and 0.03, respectively

Fig. 15
figure 15

The maximal modelled flow speeds for the same model parameterisations found in Fig. 14

The Coriolis force and dispersion have also been shown to influence tsunami propagation results (Dao and Tkalich 2007; Shuto 1991). Shuto (1991) investigated the effect of Coriolis terms when simulating the 1969 Chilean Tsunami. Coriolis was shown to effect wave amplitude but had negligible effect on arrival time. In this study, the Coriolis force was modelled in the large URSGA domain but neglected on the smaller ANUGA domain. Unlike the shallow water wave equations used here, Boussinesq equations can be applied to model the dispersive effects of tsunamis. Studies have shown (Dao and Tkalich 2007; Shuto 1991) that dispersive effects can influence tsunami propagation. The effects are limited in shallow water and dependent on the resolution of the computational mesh and the length of the simulation (Grilli et al. 2007). The wavelength of the tsunami also determines the importance of the dispersive terms. Dispersive terms have notable influence on short wavelength tsunamis, arising from submarine mass failure, and have less impact on earthquake-induced tsunamis with long wavelengths.

During the ANUGA simulation, the tide was kept constant in the offshore region at 0.80 m. In reality, the tide varied ±15 cm from the fixed tide level used over the duration of the simulation. Although we believe that this change in water level would have little effect on the inundation map, a future sensitivity study should investigate the validity of this assumption.

Here, we reiterate that the aim of this paper is to provide a spatially distributed validation of tsunami inundation, which can be adopted and applied to any physically based tsunami model. Consequently, a numerical investigation of the influence of these assumptions and parameters is beyond the scope of this paper. Instead, the above discussion is given to illustrate how the data presented here can not only be used for validation and model comparison but also for a spatially distributed investigation of model sensitivities and uncertainties.

5 Conclusion

This paper proposes a new framework and field data test for the assessment of tsunami inundation models. Currently, there is a scarcity of appropriate validation data sets due to a lack of well-documented historical tsunami impacts. The test proposed here utilises the uniquely large amount of observational data for model comparison obtained during and immediately following the Sumatra–Andaman tsunami of 26 December 2004. The proposed benchmark is intended to aid validation of tsunami inundation. However, additional tests are presented to facilitate model evaluation for the generation and propagation phases as well. In an attempt to provide higher visibility and easier accessibility for tsunami benchmark problems, the data used to construct the proposed benchmark are documented and freely available at http://tinyurl.com/patong2004-data.

An associated aim of this paper was to further validate the URSGA–ANUGA tsunami modelling methodology employed by Geoscience Australia which is used to simulate tsunami inundation. This study shows that the tsunami modelling methodology adopted is credible and able to predict detailed inundation extents and dynamics with reasonable accuracy. Model predictions matched well a detailed inundation survey of Patong City as well as altimetry data from the JASON satellite, eyewitness accounts of wave front arrival times and onshore flow speeds.

Inundation was modelled with and without buildings included in the topography data set. The presence of buildings was shown to have a significant influence on the simulated inundation extent. The presence of buildings also increased the onshore flow speeds and depths whilst decreasing the total volume of water onshore. This result indicates that the influence of human-made structures should be included, where possible, in any future studies.

It is hoped that the data and tests documented here will be applied to other tsunami models for validation purposes. The data could also be used in a novel spatially distributed sensitivity and uncertainty analysis of tsunami inundation. To help achieve this goal, we provide a discussion of the factors that influence predicted inundation.