1 Introduction

Performance-based seismic design requires seismic ground motions, normally either recorded or artificial accelerograms, as inputs for structural dynamic time history analysis. However, it is known that recorded motions are unevenly distributed in space, are non-repeatable and rely closely on soil conditions. Consequently, the available recorded ground motions for specified design scenarios are still insufficient. In these cases, engineers normally use amplitude scaling and spectral matching with typical recordings (e.g., records from the Loma Prieta, Northridge, and Chi–Chi earthquakes) to generate sufficient quantities of ground motions. However, the characteristics of the original ground motion recordings will certainly be modified, and the results from these operations could have different features from the real recordings (Luco and Bazzurro 2007). Improvements in the generation of acceptable and reliable accelerograms have recently been developed. It is now widely accepted that the new synthetic ground motion models, particularly those that combine engineering methods with seismology methods (Boore 2003; Motazedian and Atkinson 2005; Pousse et al. 2006), can generate large quantities of more reasonable ground motions and meet the requirements in engineering seismic design.

Generally speaking, the present ground motion simulation methods can be classified into three categories, i.e., the deterministic method, the stochastic method and the hybrid method. The deterministic method, which includes the 3-D finite difference method, the discrete wavenumber method and the finite fault method (Olsen et al. 1997; Mai and Beroza 2002; Guatteri et al. 2003), requires very precise information about seismic faults, propagation paths and soil conditions. It is not only difficult but also computationally expensive to simulate high-frequency motions. Stochastic process-based simulations (Pousse et al. 2006; Rezaeian and Der 2010) and stochastic point-source models (Boore 1983, 2003; Beresnev and Atkinson 1997) are empirical approaches that simulate ground motion usually by adjusting the Gaussian white noise processes in the frequency domain. Current stochastic methods are perceived as equally applicable for high and low frequencies (Motazedian and Atkinson 2005; Yamamoto and Baker 2013). The hybrid method, naturally, combines both abovementioned methods to simulate broadband seismic ground motions (Hartzell et al. 1999; Graves and Pitarka 2004; Frankel 2009; Ameri et al. 2009); however, the hybrid method requires very detailed source information and is computationally expensive, and it therefore might not be suitable for engineers. Moreover, it may not necessarily provide Green’s functions, which satisfy both the amplitude and phase information in the important intermediate frequency range from 0.5 to 2 Hz (Motazedian and Atkinson 2005).

The stochastic method has a long history of usage because it performs well in terms of matching observed ground motion characteristics. For example, the point-source method and the finite fault method (extended from the point-source method) have been used to simulate ground motions successfully for many years (Boore 1983, 2000, 2003, 2009; Ou and Herrmann 1990; Rovelli et al. 1994; Beresnev and Atkinson 1997, 1998; Berardi et al. 1999; Atkinson and Silva 2000; Atkinson et al. 2009; Edwards and Fäh 2013). These methods are popular because of their simplicity, clear physical meaning and good results in a relatively wide frequency range. Motazedian and Atkinson (2005) even concluded that the stochastic model performs better than the hybrid models used by Hartzell et al. (1999). Ground motions have a spatial variability affected by, e.g., source patterns and path and site effects, which generally cannot be described in a deterministic fashion (Cacciola and Deodatis 2011). However, it is well known that the dynamic response of non-linear structures is highly influenced by the nonstationary behavior of the input in both temporal and frequency domains (Yeh and Wen 1990; Spanos et al. 2007). The previous point and finite fault methods neglect the frequency information because the limitation of the Fourier transform is that it is inconvenient in considering the frequency nonstationarities across time, though the phase spectrum contains information on frequency evolution.

For a more reliable representation of seismic ground motions, both the amplitude and the frequency variation of ground motion time histories must be accounted for. This can be achieved by separating the temporal and frequency nonstationary characteristics properly, as suggested by Rezaeian and Der (2010). Nonstationarity is achieved by modulating the intensity and varying the filter properties in the time domain. The model can characterize the variation of the intensity in time, and the time-varying filter parameters define the evolving frequency content of the process. Furthermore, the wavelet technique is a strong tool to analyze the temporal and frequency information of time history simultaneously. Yamamoto and Baker (2013) proposed a stochastic model (Yamamoto model hereafter) based on the wavelet packets method, which uses 13 parameters to capture all characteristics of real earthquake ground motions. The wavelet packets were separated into two groups: a major group and a minor group. The former was viewed as randomly distributed, and the latter followed the logarithmic normal distribution density (LNDD) function. Huang and Wang (2015) followed this method and further investigated the spatial cross-correlations of wavelet packet parameters based on geostatistical analysis. Note, however, that the wavelet method is more convenient to inspect, analyze and modify the frequency distribution across time, which is the main advantage compared to other methods. Therefore, a deep exploration of how frequency will change across time is necessary.

In this paper, a new stochastic model for synthetic ground motions that can capture and embody nonstationarity characteristics using the wavelet packets method is developed. In the previous Yamamoto model, a single LNDD function was used to scale the wavelet packet coefficients in both the time and frequency domains, which might inherently make the frequency uniform across time. Moreover, the randomly assigned major group wavelet packets did not clearly account for the frequency characteristics of the observed ground motions. In the present model, we separate the procedure into two steps, similarly to the stochastic point-source method (Boore 2003). In the first step, randomly generated Gaussian white noise is windowed to capture the nonstationarities in the time domain. In the second step, different LNDD functions are used for wavelet packets along each decomposition time point to characterize the frequency distribution of simulated ground motions based on observations to thousands of records. It is found that only 6 parameters are required to capture the characteristics of real recordings.

Compared with the previous models of Yamamoto and Baker and Rezaeian and Kiureghian, the proposed model has the following advantages: (a) it has a small number of parameters to decrease the amount of uncertainties of the model. (b) The temporal and frequency nonstationary characteristics are separable, and the evolving frequency content of the ground motion process is characterized. (c) Similarly to the point-source method (Boore 2003), the model provides physical insight, and its parameters can be related to the characteristics of the earthquake and the site considered. (d) It predicts the peak ground acceleration (PGA) of ground motions rather than energy, and it is compatible with existing seismic ground motion prediction equations (SGMPE). (e) The model simulates the frequency nonstationarities in the global state of the time–frequency plot based directly on global information of observed ground motions, which make the model easy to understand and use.

We begin this paper with the whole procedure of the present method, which involves 6 parameters to separate the temporal and frequency characteristics. Then, the model is extended to parameter identification and regression based on an earthquake database that contains thousands of records. Finally, a validation and a comparison of the presented model are performed by comparing the intensity, duration, bandwidth and peak value of the simulated ground motions with real recordings within and without in the present database and those generated by previous models.

2 New Stochastic Model

2.1 Method

In this paper, the wavelet packets technique is employed because it separately accounts for both the temporal and spectral nonstationarities of ground motions. Several key parameters are used to control and identify the duration, shape and frequency, and variation features with the time of every accelerogram. Figure 1 illustrates the simulation procedure of the proposed model. First, the white Gaussian noise is windowed and transformed into wavelet packets by a given frequency resolution (Fig. 1a–c). In this step, T d controls the duration, and ε and η control the temporal nonstationary of the motions. Then, the frequency vectors are scaled across time to account for the frequency variation with parameters μ wm and σ wm, which are the mean and standard deviation of the LNND function, respectively (Fig. 1d). Finally, it is transformed back to the time domain using the wavelet reconstruction method to obtain the simulated ground motion and scaled by the predicted PGA (Fig. 1e).

Fig. 1
figure 1

Procedures for simulating ground motions using wavelet packet method

As shown in Fig. 1, a total of 6 parameters are chosen carefully by trial and error to control and identify the temporal and frequency variations of the accelerogram, which will be described in detail below. Figure 1c, d contains the wavelet packet coefficient (WPC) matrix that describes the amplitude and frequency variations of a time history. For a given decomposition depth, i.e., the given frequency resolution, the WPC of process x(t) at scale i and position k can then be computed as

$$\begin{aligned} {c_{j,k}^{i} } &= {\int_{ - \infty }^{\infty } {\left\{ {\int_{{{ - }\infty }}^{\infty } {x(t)w(t)\psi_{j,k}^{i} (t)} } \right\}} {\text{d}}t} \\ &= {\int_{ - \infty }^{\infty } {\left\{ {\int_{{{ - }\infty }}^{\infty } {A(\omega ,t)e^{i\omega t} {\text{d}}\overline{Z} (\omega )} } \right\}} \psi_{j,k}^{i} (t){\text{d}}t} \\ & = {\int_{ - \infty }^{\infty } {\left\{ {\int_{{{ - }\infty }}^{\infty } {A(\omega ,t)e^{i\omega t} \psi_{j,k}^{i} (t){\text{d}}t} } \right\}{\text{d}}\overline{Z} (\omega )} } \end{aligned}$$
(1)

where x(t) is the time series, w(t) is the window function, \(c_{j,k}^{i}\) is the ith set of WPC at the jth scale parameter, k is the translation parameter, and \(\psi_{j,k}^{i} (t)\) is the wavelet packet function that is localized around central time t k and frequency f i . A(ω,t) is the time- and frequency-dependent modulating function, and \(\mathop {\bar{Z}}\limits^{{}} (\omega )\) is a complex random process with orthogonal increments such that

$$E[{\text{d}}\overline{Z} (\omega ){\text{d}}\overline{{Z^{*} }} (\omega )^{\prime } ] = \left\{ {\begin{array}{*{20}l} {S_{{\overline{ff} }} (\omega ){\text{d}}\omega } & {\omega = \omega^{\prime}} \\ 0 & {\text{otherwise}} \\ \end{array} } \right.$$
(2)

where E[·] is an expectation, and \(S_{{\overline{ff} }} (\omega )\) is the two-sided power spectral density (PSD) for the zero mean stationary process, as follows:

$$\overline{x} (t) = \int_{ - \infty }^{\infty } {e^{i\omega t} } {\text{d}}\overline{Z} (\omega )$$
(3)

When the WPC matrix is obtained, as shown in Fig. 1c, the column vectors that represent the frequency distribution at certain time point are scaled by LNDD functions, which can be expressed as follows:

$$\bar{c}_{j,k} = c_{j,k}^{*} \times L_{k} (f;\mu_{{{\text{w}}k}} ,\sigma_{{{\text{w}}k}} )\begin{array}{*{20}c} {} & {\left( {k = 1,2 \ldots \frac{{2^{N} }}{{2^{j} }}} \right)} \\ \end{array}$$
(4)

where \(\mathop {\bar{c}}\nolimits^{{}}_{j,k}\) and \(c_{j,k}^{*}\) are the column vectors at t k after and before scaling, L k (f; μ wk, σ wk) is the kth LNDD function, and 2N is the number of points in a time series.

Note that different LNDD functions have been used for each column vector in the present model. This is important because it will be shown below that real earthquake recordings have a changing predominant frequency along time.

Finally, the simulated acceleration time history can be generated by wavelet packet reconstruction using the following equation:

$$x(t) = \sum\limits_{i = 1}^{{2^{j} }} {\sum\limits_{k = 1}^{{2^{N} - j}} {c_{j,k}^{i} } } \psi_{j,k}^{i} (t)$$
(5)

The simulated time history is then scaled by the predicted PGA.

2.2 Ground Motion Database

The ground motion recordings used in this study are from the Pacific Earthquake Engineering Research (PEER) Center compiled in the PEER–Next Generation Attenuation (NGA) project (Power et al. 2006). All accelerograms in this database are recorded on free-field conditions, and no aftershock recordings are included. We use only the recordings with the following four basic parameters available for most earthquakes: F t, M w, R JB and V S30. F t represents the type of faulting, such as reverse slip, strike slip, and normal slip; M w corresponds to the moment magnitude of the earthquake; R JB, the Joyner–Boore distance, is defined as the horizontal distance to the surface projection of the rupture and is available for most stations in the NGA database; and V S30 represents the soil conditions, described as the time-averaged shear-wave velocity within a 30-m depth. Our database selection criteria can be summarized clearly, as follows: (1) each recording must contain full information of F, M w, R JB and V S30. (2) The moment magnitude (M w) is larger than 4.5. (3) The ground motion records must be from the main shock of an earthquake. (4) Accelerograms of very low quality are excluded, such as extreme short motions and white noise-like motions. Thus, a total of 2444 horizontal components from 1222 records in 23 earthquakes have been collected and listed in Table 1. As seen, the numbers of records from different earthquakes are different, and most earthquakes have strike-slip and reverse fault mechanisms.

Table 1 Selected earthquakes from NGA database and the basic information

2.3 Model Parameterization and Parameter Identification

As shown above, the first step in our simulation process is to window the Gaussian white noise via variable parameters. Variable T d, which represents the seismic duration, should be discussed first because it determines the length of the white noise generated. In the Yamamoto Model, the duration is defined as the time interval between the first and last absolute values that cross 1 % of PGA. However, this definition is not perfect; it has a strong influence on the parameters that define the rise and attenuation of ground motions in the time domain. In Fig. 2, there are two records from the 1992 Big Bear earthquake (hereafter, Big Bear 1 and Big Bear 2 recordings) that have a similar S wave duration. The first one has a wave of period of P, and the second one has a long tail. Obviously, the parameters that control the nonstationarities in the time domain are much different but are similar in simulations. Another widely used seismic duration at present is energy-controlled duration (Trifunac and Brady 1975), which defines the time range between certain percentages of Arias intensities as the duration. For example, T 5–95 represents the time interval between the instants at 5 and 95 % of the Arias intensities. However, this duration also cannot properly account for the portion of the S wave for the simulation method, as shown in Fig. 2. T 5–95 and T 15–85 are shorter than the observed duration, and T 1–99 includes either a long P wave (Fig. 2a) or a long wave tail (Fig. 2b). Most ground motion simulation methods focus primarily on the simulation of the S wave. Boore’s point-source method considers T 15–85 to capture the most important part of the whole S waves in one accelerogram, but he uses the duration factor to prolong the whole simulated time history in the post-process. Based on such valuable research results, we define the duration in the present study as the time interval between the start of the S wave and the last absolute value that crosses 1 % of PGA, where the start time is decided manually because the S wave start time is subjective and difficult for programs to locate.

Fig. 2
figure 2

Acceleration time history from 1992 Big Bear earthquake recorded by Seal Beach—Office Bldg. (a) and Silent Valley—Poppet Flat station (b). Different energy durations are denoted by the time interval between lines with the same color

To capture the amplitude and distribution of the content in the time domain, the following Saragoni window function (Saragoni and Hart 1974) is used:

$$w(t) = a(t/T_{\text{d}} )^{b} \exp ( - c \times t/T_{\text{d}} )$$
(6)
$$b = - \varepsilon \ln \eta /\left( {1 + \varepsilon (\ln \eta - 1)} \right)$$
(7)
$$c = b/\varepsilon$$
(8)
$$a = (\exp (1)/\varepsilon )^{b}$$
(9)

where w(t) is the Saragoni window for shaping the acceleration in time domain, the peak value of the window envelope occurs at fraction ε of specified duration T d, and the amplitude at time T d is reduced to fraction η of the maximum amplitude. a, b and c are intermediate parameters.

The normalized window functions with different ε and η are shown in Fig. 3a, b illustrates the way to make parameter identifications of ε and η for each recording.

Fig. 3
figure 3

Saragoni window function with different parameters (left) and a Husid plot of Big Bear 2 against simulation (right)

The window function is able to reflect the nonstationary of ground motions in the time domain conveniently. Nonstationary also refers to the variation of the frequency content in the frequency domain. It is found that most recordings have a relatively high frequency with increasing acceleration values; then, the frequency gradually becomes lower with the decrease in the acceleration amplitude. This characteristic is clearly shown in Fig. 4, in which the Big Bear 1 record is decomposed into WPCs, and the wavelet decomposition depth is 9 so that the frequency resolution is 0.194 Hz and the time resolution is 2.56 s. The WPCs are squared numbers to show peak points clearly. Each single curve represents the column vector of WPCs, which displays the frequency distribution at the corresponding time, with a circle denoting the position of the maximum frequency (Fig. 4a). The black filled circle represents the total predominant frequency when all column vectors are summed. It is, therefore, necessary to use different LNDD functions for each WPC frequency vector (Fig. 4b) rather than only one LNDD for all WPCs.

Fig. 4
figure 4

Time–frequency plot of Big Bear 1 ground motions (a) and the simulation (b)

The frequency distribution follows the LNDD function, i.e.,

$$L_{k} (f;\mu_{\text{wk}} ,\sigma_{\text{wk}} ) = \frac{1}{{f\sigma_{\text{wk}} \sqrt {2\pi } }}e^{{ - (\ln f - \mu_{\text{wk}} )^{2} /2\sigma_{\text{wk}}^{2} }} \begin{array}{*{20}c} \quad {\left( {k = 1,2 \ldots \frac{{2^{N} }}{{2^{j} }}} \right)} &\\ \end{array}$$
(10)

where L k (·) is the function for shaping the kth frequency vector, f is the frequency series, and μ wk and σ wk are the mean and variance of the kth LNDD function.

The next problem is determining how to define μ wk and σ wk. Two typical characteristics for most recordings are summarized below: (1) the total predominant frequency (the black circle in Fig. 4) is a critical frequency that can have a significant effect on the acceleration response spectra of ground motions. (2) This total predominant frequency occurs around the midpoint of the duration of each accelerogram, as shown in Fig. 2a, in which the time history has a clear dividing point that separates high and low frequencies. A statistical analysis of the present earthquake database shows that 79 % of the recordings have the above characteristics. The algorithm for μ wk and σ wk can thus be defined as follows: for k < m,

$$\left\{ {\begin{array}{*{20}c} {\mu_{\text{wk}} = \mu_{\text{wm}} \times (1 + 1 \times p)^{m - k} } \\ {\sigma_{\text{wk}} = \sigma_{\text{wm}} \times (1 - 2 \times p)^{m - k} } \\ \end{array} } \right.$$
(11)

for k ≥ m

$$\left\{ {\begin{array}{*{20}l} {\mu_{\text{wk}} = \mu_{\text{wm}} \times (1 - 0.5 \times p)^{k - m} } \\ {\sigma_{\text{wk}} = \sigma_{\text{wm}} \times (1 + 2 \times p)^{k - m} } \\ \end{array} } \right.$$
(12)

where μ wm and σ wm are the mean and variance of the mth LNDD function, which follow the total predominant frequency distribution, and p is the variance rate. Here, m = int(2N/2j), int is a rounding function.

The parameter identification for μ wm and σ wm is conducted by selecting proper values that match the acceleration response spectra of real ground motions, as Fig. 5 shows. It is computationally expensive if this identification is performed by a computer program because it is not a normal optimization problem that can be quickly solved by a computer. For the stochastic method, the white Gaussian noise generated each time is different so that the acceleration response spectrum will be slightly different, even if the same μ wm and σ wm are used. However, these parameters define a trend or a range of certain response spectra. Therefore, the identification can also be conducted manually by comparing simulated and observed response spectra. Note that p is also a parameter to be identified, and it is fixed here because its value has a strong influence on μ wm and σ wm. In other words, if p is also set as a parameter to be predicted, it will be difficult and possibly impossible to simulate the ground motions, whose acceleration response spectra can match the real ground motion. Therefore, p is determined by the following equations through trial and error for hundreds of recordings:

$$\left\{ {\begin{array}{*{20}l} {p = 0.08,} \\ {p = 0.04,} \\ {p = 0.02,} \\ \end{array} \begin{array}{*{20}l} {} \\ {} \\ {} \\ \end{array} \begin{array}{*{20}l} \quad {\text{if}} \\ \quad {\text{if}} \\ \quad {\text{if}} \\ \end{array} \;\begin{array}{*{20}l} {T_{\text{d}} \le 90s} \\ {90s \le T_{\text{d}} \le 150s} \\ {T_{\text{d}} \ge 150s} \\ \end{array} } \right.$$
(13)
Fig. 5
figure 5

Acceleration response spectra of real ground motions recorded at Anaverde Valley—City R station in 1994 Northridge earthquake and its simulation for the identification of μ mk and σ mk

The above identification process is reasonable because if the duration of the ground motion is long, there will be more time points whose frequency distribution is to be scaled. In this case, a large p would cause a very high frequency at the beginning and a very low frequency at the end of the simulated ground motion. For a time history that is less than 90 s long, p equals 0.08 is optimal.

The last parameter is A pga, which determines the amplitude of the ground motions. In the Yamamoto model, parameter E acc is defined to represent the energy of the acceleration time history, which is as follows:

$$E_{\text{acc}} = \int_{ - \infty }^{\infty } {\left| {x(t)} \right|^{2} } {\text{d}}t = \sum\limits_{i} {\sum\limits_{k} {\left| {c_{j,k}^{i} } \right|} }^{2}$$
(14)

Furthermore, in the Rezaeian and Kiureghian model, the Arias intensity is used to define the energy of simulated motions. In most cases, PGA is an important parameter in both engineering and seismology fields, although it is only one parameter among many that can characterize a ground motion. However, energy is a parameter that related to both amplitude and duration. As illustrated in Fig. 6, the correlation coefficient of E acc and PGA × T d is larger than that of E acc and PGA, indicating that E acc is a parameter determined both by PGA and T d. Because the model parameters are generated by regression equations (described in the next section), it would increase uncertainties by regression residuals if both E acc and T d are predicted. It is, therefore, better to set PGA as a model parameter directly to determine the amplitude of the simulated acceleration time history and then investigate the correlation of residuals between all model parameters. Thus, a good regression model is critical for the accuracy of the presented simulation models.

Fig. 6
figure 6

Correlation of PGA, E acc and PGA × T d with correlation coefficient and residual standard deviation

3 Regression Analyses

3.1 Regression Model

SGMPEs are derived from a strong-motion dataset by regression methods, such as the ordinary least squares method and the multivariate Bayesian method (Arroyo and Ordaz 2010). For the sake of simplicity, a linear form of a regression equation is employed for each model parameter in terms of explanatory functions representing the type of faulting (F t), earthquake magnitude (M w), source-to-site distance (R JB) and soil effect (V S30). To address the residuals between earthquakes and between records, a two-stage regression analysis is employed (Joyner and Boore 1993), and the function follows the Boore and Atkinson (2008) in the following form:

$$\log (Y) = F_{\text{M}} (M_{\text{W}} ) + F_{\text{D}} (R{}_{\text{JB}},M_{\text{W}} ) + F_{\text{S}} (V_{S30} ) + \delta + \xi$$
(15)

In the equation, Y is the model parameters introduced in Sect. 2.3, F M, F D and F S represent the magnitude scaling, distance function, and site amplification contribution, respectively, and δ and ξ are inter-event and intra-event residuals with mean zero and variances τ 2 and σ 2, respectively.

The magnitude scaling is given by:

$$F_{\text{M}} (M) = e_{1} {\text{SS}} + e_{2} {\text{NS}} + e_{3} {\text{RS}} + e_{4} {\text{CC}} + e_{5} \times (M_{\text{W}} - M_{\text{h}} ) + e_{6} (M_{\text{W}} - M_{\text{h}} )^{2} \quad {\text{if}}\;M_{\text{w}} \le M_{\text{h}}$$
(16)
$$F_{\text{M}} (M) = e_{1} {\text{SS}} + e_{2} {\text{NS}} + e_{3} {\text{RS}} + e_{4} {\text{CC}} + e_{7} (M_{\text{W}} - M_{\text{h}} )\quad {\text{if}}\;M_{\text{w}} > M_{\text{h}}$$
(17)

where e 1, e 2, e 3, e 4, e 5, e 6 and e 7 are coefficients, and SS, NS, RS and CC are dummy variables that denote strike slip, normal slip, reverse slip and the Chi–Chi earthquake, respectively, as represented by the values in Table 2. Note that although the Chi–Chi earthquake is also a reverse slip, it is assumed to be a different fault type because it contributes to 297 records. Therefore, the results might be overly influenced. Furthermore, it is an earthquake recorded in Asia; most earthquakes in Table 1 occurred in the United States. Atkinson and Boore (2007) also concluded that the Chi–Chi earthquake might affect the ground motion prediction equations (GMPEs), particularly for the PSA at periods of greater than 5 s. Therefore, CC is used here to represent the Chi–Chi earthquake.

Table 2 Values of dummy variables for different fault types

The distance function is given by:

$$F_{\text{D}} (R_{\text{JB}} ,{\text{M}}_{\text{W}} ) = \left[ {c_{1} + c_{2} (M - M_{\text{ref}} )} \right]\ln (R/R_{\text{ref}} ) + c_{3} (R - R_{\text{ref}} )$$
(18)

where

$$R = \sqrt {R_{\text{JB}}^{2} + h^{2} }$$
(19)

and c 1, c 2, c 3 and h are coefficients to be determined.

The site amplification function is simplified here in the form:

$$F_{S} (V_{S30} ) = b\ln (V_{S30} /V_{\text{ref}} )$$
(20)

where b is the parameter to be determined.

In this model, there are several parameters that follow the Boore and Atkinson model: M h is 6.9, M ref is 4.5, R ref is 1.0 km and V ref is 760 km/s, respectively.

3.2 Regression Results

The two-stage regression method is a maximum likelihood estimation for parameters, which is programed in MATLAB software in the present study. Table 3 provides the estimated parameters and standard error components. There are some interesting insights gained from Table 3. For example, the seismic duration (T d) tends to increase with the increasing of magnitude and distance (see e 7 and c 1) but decrease as site stiffness increases (see b). This finding is consistent with observations that more distant sites tend to experience longer motions. Acceleration attenuation parameter η and bandwidth parameter σ wm decrease with the increasing of site stiffness, as expected (see b). For the soft soil site, seismic acceleration tends to attenuate slower than the hard soil site; therefore, the value of η is large. Meanwhile, the high-frequency wave will be filtered in the soft soil site so that σ wm becomes larger. (A larger σ wm leads to a wide bandwidth.) This phenomenon is also observed in the results of μ wm, which tends to increase as site stiffness increases because a larger μ wm will generate a high total predominant frequency. This finding is also consistent with the observations. Parameter A pga indicates that normal and reverse faults can cause a larger PGA than a strike-slip fault (see e 2 and e 3), and PGA tends to increase with decreasing site stiffness (see b).

Table 3 Maximum likelihood estimates of regression coefficients and standard error components

Tables 4 and 5 show the correlation of the inter- and intra-event residuals, respectively. Several of these estimated correlations also provide interesting insights. Note that T d has a negative correlation with both μ wm and A pga, and it has a positive correlation with σ wm. This finding is as expected because ground motions recorded at a long distance site usually have a long duration with a low PGA. Furthermore, they contain more low-frequency components than high-frequency components, which leads to a small μ wm and a large σ wm. A relatively strong negative correlation is observed between η and A pga. This finding is consistent with the discussion for Table 3, i.e., that a long distance site often has a small PGA but more low-frequency components, which cause a low acceleration attenuation speed and thus a large η.

Table 4 Correlation of intra-event residuals
Table 5 Correlation of inter-event residuals

Figure 7 shows the diagnostic scatter plots of the residuals versus the predictor variables. Inter-event residuals can only be plotted against M w, whereas intra-event residuals are plotted against M w, R JB and V S30. This shows that the residuals are evenly scattered around zero and thus implies that the regression model is a good fit to the data.

Fig. 7
figure 7

Scatter plots of residuals against earthquake magnitude, R JB, and V S30 for each model parameters

4 Numerical Simulations

For a given set of earthquakes and site characteristics (F t, M w, R JB, V S30), large quantities of synthetic ground motions can be generated with a detailed introduction below, based on the predicted parameters described in the preceding sections.

Note that the intra-event and inter-event residuals are generated as random values, and they obey multivariate normal distributions with a covariance matrix computed by the residuals of the six parameters. The wavelet used in the present study is the “Meyer” wavelet, and decomposition depth d is set as 9 because the dt is 0.005; therefore, the frequency and time resolution can reach 0.194 Hz and 2.56 s. A high-frequency resolution can guarantee that enough frequency points will be scaled by the use of Eq. (8) and that 2.56 s is enough time in the time domain because the seismic ground motion is generally longer than 5 s.

The simulation method proposed in the present study carries both deterministic and random aspects of ground motion shaking. The deterministic aspects are that the frequency contents are freely adjustable according to F t , M w, R JB and V S30. The stochastic aspects are specified by the Gaussian white noise, and the predicted parameters have random effects because of the residuals.

To demonstrate this phenomenon, Fig. 8a, b shows two set of ground motions, i.e., one recorded motion and three simulated motions with acceleration, velocity and displacement time history in each set, for given values of F t, M w, R JB, V S30 from past earthquakes. The acceleration time history is processed by a high-pass filter with a cutoff frequency 0.1 Hz. The table below the graphs displays the randomly generated parameters for the synthetic motions. Note that these two events have very similar earthquake and site characteristics (F t = reverse; M w = 6.61, 6.93; R JB = 61.79, 58.52 km; V S30 = 235, 190.14 m/s), but the two recorded motions are somewhat different. The first motion has a longer duration, a relatively high frequency and a low PGA, whereas the second motion has a shorter duration with a low frequency and a low PGA. It is notable that the parameters are randomly generated, and the Gaussian white noise is different for every simulation, which denotes the random aspects of the present method. Meanwhile, the generated parameters seem to be distributed within a certain range. For example, the durations of the first motion are approximately 30–45 s, whereas the second motion has durations of approximately 20–35 s, which can also be observed for PGA variation ranges. This is the deterministic aspect of the present method. Note that T d is the duration of the whole Gaussian white noise rather than any energy duration or the ground motion. For example, in Fig. 8b, the first simulation has a T d of 35.76 s, but it looks shorter than the second simulation, which has a T d of 28.73 s. The reason is that the first simulation has a smaller ε and η than the second simulation. Furthermore, the general features of the simulated accelerations are similar to those of the recorded motions. The velocity time histories tend to have a low frequency and a large amplitude in the latter half part of the motions, resulting from parameter p defined in Eqs. (11)–(13). This phenomenon is highly influenced by parameters ε, η, μ wm and σ wm. If all these parameters can be predicted accurately, the velocity and displacement time histories will be more close to the recorded ones, as shown by Sim-2 in Fig. 8a and Sim-1 in Fig. 8b.

Fig. 8
figure 8

a Recorded and synthetic motions corresponding to F t = 2 (reverse faulting), M w = 6.61, R JB = 61.79 km, and V S30 = 235 m/s. The recorded motion is from the 1971 San Fernando earthquake at the Carbon Canyon Dam station; b recorded and synthetic motions corresponding to F t = 2 (reverse faulting), M w = 6.93, R JB = 58.52 km, and V S30 = 190.14 m/s. The recorded motion is from the 1989 Loma Prieta earthquake at the SF International Airport station

In practice, the ground motion simulation method has two basic functions: one is to generate ground motions to meet special engineering requirements, and the other is to investigate the simulation parameters for certain regions. For example, the stochastic point-source and finite fault methods use a specified high-frequency decay parameter, κ, and a low-frequency cutoff parameter, f c (see Boore 2003). To realize both functions above, it is necessary to simulate ground motions with given values for some model parameters, e.g., the μ wm and σ wm that control the predominant frequency of ground motion. In these cases, the μ wm and σ wm are fixed, and the remaining variables are generated based on the conditional mean vector and covariance matrix. Figure 9 shows the recorded motions together with three synthetic motions and compares the best simulation with pseudo-spectral acceleration (PSA) against the recorded motion. The first line of bold numbers in the table is the identified model parameters for the recorded motion, and the three lines below are the generated parameters with a fixed μ wm and σ wm. Simulations with fixed parameters lead to a stable frequency content, and it is clear that the simulated motion shows an increasingly lower frequency along the time axis.

Fig. 9
figure 9

Recorded and synthetic motions with specified μ wm and σ wm. The recorded motion is from the 1999 Chi–Chi earthquake at the CHY054 station with F t = 1 (Chi–Chi), M w = 7.62, R JB = 48.49 km, and V S30 = 172.1 m/s

5 Model Validation

To validate the proposed model, comparisons are made between the selected recorded and synthetic ground motions to demonstrate the similarity of their features. The comparison includes elastic response spectra, previous NGA ground motion prediction equations and previous stochastic models.

5.1 Against Earthquakes Within and Without the Database

Figure 10 displays the 5 % damped elastic response spectra of the accelerogram recorded at the Alhambra-Fremont School station in the 1994 Northridge earthquake and 50 synthetic motions simulated by inputting the same F t, M w, R JB and V S30 as the recorded motion without fixing any model parameters. In Fig. 10a, the regressed model described in Sect. 3 is used (hereafter RM1). In the left graph, the spectra of the recorded motions are well distributed within the scope of variability of the spectra of the synthetic motions at each period considered. The discrepancy between the simulated spectra and the recorded ones is primarily caused by the misfit of the PGA of the motions, which can be seen in the right graph in Fig. 10a, in which all PGAs of the simulated ground motions are scaled to be equal to the recorded motion, and most spectra of synthetic motions agree much better with the spectra of the recorded motion; i.e., the variance of synthetic motions becomes smaller. This finding is important because in engineering practice, the PGA of ground motions is often scaled to meet the requirement of special structural design conditions, e.g., incremental dynamic analysis (IDA) for tall buildings. It seems that the simulation model will be better if the PGA of the synthetic motions can be predicted more accurately, but this is usually difficult, and such variability observed in the spectra of synthetic motions is reprehensive of the variability inherent in ground motions for the given F t, M w, R JB and V S30.

Fig. 10
figure 10

Elastic response spectra (5 % damped) of the 1994 Northridge earthquake recorded at Alhambra-Fremont School station and of 50 synthetic motions. The motion correspond to F t = 2, M w = 6.69, R JB = 35.66 km, and V S30 = 550 m/s. a Unscaled and scaled simulated spectra using a regression model in Sect. 3. b Unscaled and scaled simulated spectra using a regression model without Northridge records

Figure 10b contains the simulations for the Alhambra-Fremont School station in the 1994 Northridge earthquake but with different regression equations. In the present earthquake database and regression model, the Chi–Chi earthquake has the largest number of records and has been set as a different fault mechanism. The Northridge earthquake, however, has 140 records, almost half the number of the Chi–Chi earthquake. To determine how the Northridge earthquake affects the simulation results, we use a database without the Northridge earthquake and obtain another regression model (hereafter RM2). Then, 50 unscaled and scaled synthetic motions with the same F t, M w, R JB and V S30 are obtained, respectively. Figure 10b shows that the PSA of simulated ground motions computed with RM2 decay are slower than those computed with the RM1 against the period, indicating that long period components diminish faster in the Northridge earthquake. Furthermore, the parameter variance in RM2 becomes smaller as simulations are more concentrated than in RM1.

The earthquakes in the present database occurred from 1971 to 2003. To test the prediction ability of the proposed model, a recent earthquake, the strike-slip-type South Napa earthquake on 24 Aug 2014 (M w = 6.0), is selected as the target earthquake. In Fig. 11, the left graph compares the recorded and simulated PGA distribution against R JB. Only one simulation is conducted for each station, and the V s30 is randomly generated within 250–700 m/s for stations without clear soil condition information. It is found that the simulated PGA is slightly higher than the recorded ones. The reason might be that more than 80 % of the stations do not have any V s30 information, and the recorded PGA in the Napa earthquake is scattered for the same R JB. However, the simulated PGA does reflect the general attenuation trend of the Napa earthquake. The right graph in Fig. 11 is the simulated PSA for the ground motion recorded at the Santa Rosa Calistoga and Marit stations. The gray lines indicate 50 simulations and are not scaled by PGA. This shows that the recorded PSA generally falls in the middle of the 50 simulations, indicating that the present model has the capacity to predict earthquake ground motion levels. This capacity could be better if there were more parameters involved in the model and more earthquakes added to the database.

Fig. 11
figure 11

Simulation of PGA distribution against R JB for Napa earthquake (left graph) and simulation of unscaled elastic response spectra (5 % damped) of ground motion recorded at Santa Rosa Calistoga and Marit stations. The motion corresponds to F t = 0, M w = 6.0, R JB = 30.20 km, and V S30 = 300 m/s

5.2 Against with NGA Attenuation Models

Because the synthetic ground motions are intended primarily for use in engineering practice, a reasonable validation is to investigate how these motions are comparable and consistent with existing ground motion prediction equations. To this end, the elastic response spectra (5 % damped S a(T)) at 0.2 s, 1.0 and 3.0 s of a set of 500 synthetic motions against R JB for given magnitude and soil conditions (F t = 0 (strike slip), M w = 5, 6 and 7, V s30 = 300 m/s) are compared with the ground motion predicting equations (GMPE) developed by Boore and Atkinson (2008; BA08), Chiou and Youngs (2008; CY08), and Abrahamson and Silva (2008; AS08).

Figure 12 shows the median (color box) and ±one logarithmic standard deviation values (cross above and below) of the response spectra of simulated ground motions. It is shown that the simulated median values of PGA are in close agreement with the BA08, CY08 and SGMPEs for all magnitudes and with AS08 for magnitudes of 6 and 7. For other response spectra periods, the simulated medians match well with CY08 and are slightly higher than BA08 SGMPE. The comparison with AS08 has the worst result; the simulated values are generally higher for all magnitudes, particularly for M w = 5. Furthermore, the standard deviations of the simulated response spectra tend to increase with the period. The inherent feature of stochastic methods is that it is difficult to simulate low-frequency motions accurately. However, to address this issue, the variable LNDD functions are introduced and used in our present model. Then, the median response values at a long period (1.0 and 3.0 s) fit well to the existing GMPEs.

Fig. 12
figure 12

Median and ±1 logarithmic standard deviation values of the response spectra of 500 synthetic motions and corresponding values predicted by existing GMPEs without residuals. The selected parameters are F t = 0 (strike slip), M w = 5, 6 and 7, and V s30 = 300 m/s. Z1.0 = 0.024 km for CY08. Ztor = 0.034 km, W = 20 km and δ = 90 for AS08

The database used in the present study is a sub-set of that used by the above 3 SGMPEs, which all come from the NGA project. Thus, the simulated PSA values have a similar attenuation trend with those SGMPEs. These comparisons show that the method presented in this study is viable and consistent with existing GMPEs, particularly for magnitudes greater than 6.0 because approximately 70 % of ground motion recordings in our database have magnitudes of 6.0 or higher.

5.3 Against the Yamamoto Model and Finite Fault Model

If the structural static analysis is of primary concern, then the elastic response spectra approach is a good choice to perform seismic design in engineering practice. However, with the development of computer hardware and software, dynamic time history analysis is becoming an obligatory step to investigate the non-linear performance of important structures during earthquakes. Therefore, the simulation of frequency variation along the time domain becomes critical because the natural period of the structure might change as structural damages accumulate.

Figure 13 shows the time–frequency plot of a record motion and 3 simulated motions generated by the present model, the Yamamoto model and the finite fault model (Motazedian and Atkinson 2005), respectively. For a fair comparison, the synthetic motion is selected when its elastic response spectra and duration fit best to that of the recorded motion for each simulation model. Furthermore, the PGA of all simulated motions is scaled the same as the record one. As mentioned in the previous section, earthquake ground motions tend to have a relatively high frequency as acceleration values increase; then, the frequency becomes gradually lower with the decreasing acceleration amplitude. This is shown clearly in Fig. 13a, which displays the time–frequency plot of recorded motion. It shows that frequencies at the range of 1–10 Hz are found before 20 % of T d, representing the high frequencies; then, the predominant frequencies are approximately 2 Hz at each time point from 20 to 60 % of T d. Finally, the frequencies decrease to lower than 1 Hz after 60 % of T d.

Fig. 13
figure 13

Time–frequency plot of the 1979 Imperial Valley earthquake recorded at the El Centro Array #12 (a) and 3 synthetic motions generated by the present model (b), Yamamoto model (c) and finite fault model (d)

The result from our model is shown in Fig. 13b. There are basically 3 frequency ranges: 2–10 Hz before 20 % of T d, approximately 2 Hz from 20 to 40 % of T d, and approximately 1.5 Hz after 50 % of T d. For the Yamamoto model and the finite fault model, as shown in Fig. 13c, d, the frequency distributions seem to be more even at each time point within the range of 0–10 Hz.

It should be noted that the synthetic motions generated by the 3 models have very similar elastic response spectra to the recorded one, even they have a different predominant frequency at each time point. Therefore, the response spectra or the Fourier spectra reflect the frequency distribution of the time history without considering the exact time effect. Although the time–frequency plot of our model is not exactly the same as the recorded motion, the general features of the simulated motion are similar in character to the recorded one, which indicates that our model is acceptable and adoptable in estimating strong ground motion.

6 Conclusions

A model based on a wavelet method for simulating nonstationary ground motions for a given set of earthquakes and site characteristics has been developed. Six model fundamental parameters are defined, and their prediction equations are developed, which relate the parameters of a stochastic model to the earthquake fault mechanism, moment magnitude, Joyner and Boore distance and site shear-wave velocity. Taking the uncertainty of model parameters into consideration, the simulated suite of ground motions can reasonably capture the natural variability of recorded ground motions.

The model proposed in the present study is a stochastic model based on a modulated, filtered white noise process. It incorporates both temporal and spectral nonstationarities. The 6 model parameters can properly characterize the duration, the evolving intensity, the predominant frequency, the bandwidth and the frequency variation of the ground acceleration process, respectively. All parameters are identified manually for each real ground motion in the database used in this paper except for PGA. In normal ways, these parameters can be identified by computers using an optimization method, but we find it is better to conduct it manually for several reasons. Stochastic model parameters only give an average result of simulated ground motions; it is, therefore, impossible for two simulations to be exactly the same with the same parameters. Thus, an optimization algorithm is difficult and slow to give precise and correct parameter values. The parameter identification process is not overelaborated; parameters of those records from one earthquake are similar. Therefore, the model can be replicated and adapted to another database.

To predict the aforementioned model parameters, the two-stage regression analysis is employed to investigate inter- and intra-event residuals. The regression coefficients and error variances are estimated by a correlation analysis. The results of these analysis indicate that the 6 model parameters are not independent of specified earthquake and site characteristics. Predicting future ground motions is one of the functions of the simulation method. Another is fitting past earthquakes to investigate the model parameters for regions of interest. Therefore, our model can specify certain parameters to match recorded ground motions, just as point-source and finite fault methods do. The well-studied model parameters can be used in turn to simulate ground motions in regions that have similar fault and site information or to predict ground motions for the same region.

To demonstrate our model effect, a systematic analysis is conducted by comparing the synthetic acceleration time histories with recorded ones. The validation of the present model is conducted as follows: (1) comparing the elastic response spectra of synthetic motions and real ground motions, (2) comparing the median and ±1 logarithmic standard deviation of simulated motions with existing GMPEs, and (3) comparing the time–frequency plot of real ground motion and synthetic motions generated by the present model, the Yamamoto model and the finite fault model. All comparisons indicate that the proposed model in the present study is effective and can be used in seismic design to estimate strong ground motions.