1 Introduction

Based on earthquake recordings, the horizontal-to-vertical spectral ratio of earthquakes (HVSRE) is defined as the ratio of the horizontal Fourier amplitude spectrum (FAS) to the vertical FAS at a point on Earth’s surface. Lermo and Chavez-Garcia (1993) were the first to calculate HVSRE, focusing on four stations located in Mexico City. Since then, HVSRE is usually used to directly represent the site effect around a given station; however, there are some detectable differences between the HVSRE value and other spectra in relation to site responses (Lermo and Chavez-Garcia 1993; Field and Jacob 1995; Satoh et al. 2001). Usually, one or several peaks, which are caused by a high impedance contrast between two adjacent soil layers, can be distinguished based on HVSRE. When a weak motion is observed at a certain site, the HVSRE value can be regarded as irrelevant to the event of the motion because the shear modulus and hysteretic damping do not vary significantly under a small level of excitation. In this case, the HVSRE averaged over several weak motions can be used to represent the empirical site characteristics. Panzera et al. (2017) provided a favorable review of HVSRE; they investigated the directional effects of tectonic fractures on site amplification corresponding to the ground motion based on the earthquake and ambient noise data in South Iceland. On the other hand, in cases of strong motions, the average HVSRE was not appropriate anymore because of the presence of nonlinear soil behaviors (namely, soil nonlinearity). From the perspective of soil mechanic properties, nonlinear soil behaviors included a decrease in soil shear modulus and an increase in soil hysteretic damping (Midorikawa 1993; Beresnev and Wen 1996). The shear modulus and soil damping can be directly calculated from the hysteresis loop associated with the stress–strain relationship of soils, and many pioneers implemented the related cyclic triaxial tests (or cyclic ring-shear tests) in their numerical models (e.g., Hardin and Drnevich 1972; Mohammadioun and Pecker 1984).

In general, three kinds of spectral ratios are used to estimate site effects: the standard spectral ratio (SSR; Borcherdt 1970), surface-to-borehole spectral ratio (SBSR; Kitagawa et al. 1988), and horizontal-to-vertical spectral ratio (HVSR). HVSR, which originated from the quasi-transfer spectrum (QTS) proposed by Nakamura (1988), can be separated into two types: HVSRE and the HVSR of microtremors (HVSRM which remains in a linear regime). Naturally, nonlinear soil behaviors can be detected using these ratios during a significant level of excitation (except for HVSRM). In general, the captured nonlinear behaviors can be summarized into two phenomena: (a) a shift in the fundamental frequency toward the lower frequency side and (b) a reduction in amplitudes at high frequencies (Dimitriu et al. 1999, 2000; Dimitriu 2002). The decline in the S-wave velocity (VS) value of surficial layers, which is attributable to the reduction of shear modulus, can be regarded as the reason for the peak shift at fundamental frequencies toward the lower frequency side. The increment in soil hysteretic damping reduces the amplitude at high frequencies. Further evidence of soil nonlinearity can be found in the previous observations of large earthquakes, such as the 1994 Northridge earthquake in the United States (Field 1997), the 1995 Kobe earthquake in Japan (Aguirre and Irikura 1997), and the 2011 Tohoku earthquake in Japan (Bonilla et al. 2011).

Over the last 15 years, several factors have been used to detect and quantify the discrepancy in the spectra between weak and strong shaking. Three factors have been proposed for measuring the level of nonlinearities over the entire frequency range: the degree of nonlinearity (DNL; Noguchi and Sasatani 2008), the percentage of nonlinearity (PNL; Regnier et al. 2013), and the absolute degree of nonlinearity (ADNL; Ren et al. 2017). As shown in Fig. 1, DNL considers the spectral difference between weak and strong motions in the logarithmic coordinate system. ADNL, which can be thought of as an improved version of DNL, considers both the difference and the standard deviation (STD) of weak shakings. Dividing the ANDL by the surrounding area of the weak motion spectra yields PNL, which can be seen as a normalization of ADNL. PNL gives a comparably higher tolerance to those weak motion spectra that have a high average amplitude. All three of DNL, ANDL, or PNL can well assess the level of soil nonlinearity presenting itself during strong motions, yet these factors do not possess the ability to reproduce (or predicting) the HVSR in nonlinear cases (HVSRN) from the HVSR in linear cases (HVSRL). This means that these factors can only capture nonlinear soil behaviors from recordings of strong shaking, but they are not able to modify the spectra in linear cases to simulate those in nonlinear cases.

Fig. 1
figure 1

Illustration of some previous factors for detecting nonlinear soil behaviors. a DNL. b ADNL. c PNL

To reproduce the HVSRN, Wang et al. (2021) proposed the nonlinear correction function (NCF). They argued that the spectral discrepancy between the weak and strong motions could be approximately regarded as a shift in frequency (horizontal axis) and amplitude (vertical axis). As shown in Fig. 2, the shift values in the frequency and amplitude axes are defined as α and β, respectively. As for the meaning of α and β, suppose that the α (or β) is equal to 0.5; the shift on the horizontal (or vertical) axis will be the unit length of 0.5 in the logarithmic coordinate system (a unit length here is the distance from one to ten). Based on the NCF and a given peak ground acceleration (PGA) value, the HVSRN associated with a certain PGA value can be derived from the HVSRL at the same location. Subsequently, the vertical amplification correction function (VACF), of which concept was first proposed by Kawase et al. (2018), was adopted to transform HVSRN into the horizontal site amplification factor (HSAF) in nonlinear cases. However, Wang et al. (2021) solely considered the recordings of the Kinki area in southwestern Japan; the area for gathering data should be extended, and the feasibility of NCF should be validated in other regions of Japan. Moreover, the site classification should also be considered for the NCF because sites with the different categories should require different NCFs.

Fig. 2
figure 2

Horizontal-to-vertical spectral ratio of earthquakes in linear cases (HVSR LE ), the horizontal-to-vertical spectral ratio of earthquakes in nonlinear cases (HVSR NE ), and the illustration of α and β; this is an example of the event that occurred on July 5, 2011 at WKYH01

Comparing our data with that of Wang et al. (2021), who gathered records only from the Kinki area, we collected recordings from all over Japan and then adjusted the factor for calculating the degree of spectra difference between linear and nonlinear cases. Herein, PGA, the peak ground velocity (PGV), and the quotient of PGV divided by the time-averaged shear wave velocity down to 30 m (PGV/VS30) were considered to have impacts on the α and β values. The station category was also considered. A series of improved NCFs was derived and used to replicate HVSR NE from HVSR LE . Subsequently, compared with the VACF systemically proposed by Ito et al. (2020), the VACF considering VS30 (VACFVS30), which involves the VS30 value of a given station, was used to transform the HVSRE into HSAF for better approximation. Details will be introduced in the following sections.

2 Data, source, and methodology

2.1 Database

As shown in Fig. 3a, we considered 1065 sites in total. All the sites belong to the Kyoshin Network (K-NET) and Kiban Kyoshin Network (KiK-net), installed by the National Research Institute for Earth Science and Disaster Resilience (NIED; Aoi et al. 2021). Among these sites, the VS30 value could be calculated for 1034 sites on the basis of information from borehole surveys. However, it could not be derived for the other 31 sites because of the lack of borehole data. The sites for which the VS30 values are unknown have been marked using solid black circles in Fig. 3a. We only considered sites where seismographs captured at least one record with a PGA value greater than 100 cm/s2 and at least five records with a PGA value in between 4 and 15 cm/s2. From recordings within PGAs greater than 100 cm/s2, we expected that it would be possible to extract the characteristics of nonlinear soil behaviors. The records in the PGA range of 4–15 cm/s2 were considered as weak motions; in other words, we believe that nonlinear soil behaviors do not exist in this PGA range. The restriction of at least five weak motions is to guarantee the spectra shape stability of HVSR LE . Figure 3b shows the epicentral distances and Japan Meteorological Agency (JMA) magnitude (MJMA) of chosen earthquakes associated with all the strong and weak motions (Please see the article by Katsumata (1996) if readers are interested in the definition of MJMA). We utilized all types of earthquakes in terms of source categories, that is, crustal earthquakes, subducting plate-boundary earthquakes, and intraplate earthquakes. These three types of earthquakes were considered together rather than analyzed individually, as the focus of this study was site effects.

Fig. 3
figure 3

Information about chosen sites and earthquakes; a site locations with the time-averaged shear wave velocity under 30 m (VS30) values; b Epicentral locations with MJMA

Besides weak (namely, 4–15 cm/s2) and strong (> 100 cm/s2) motions, moderate motions, which have a PGA value of 15–100 cm/s2, were also included in this study. Many researchers have adopted a PGA value of 100 cm/s2 as the criterion or threshold for the potential occurrence of nonlinearity (Dimitriu et al. 2000; Noguchi and Sasatani 2011; Ren et al. 2017; Zhou et al. 2020; Wang et al. 2021). However, nonlinear soil behaviors have been observed in some recordings with PGA less than 100 cm/s2 (Wen et al. 2006; Wu et al. 2010; Rubinstein 2011; Dhakal et al. 2017). Therefore, it is necessary to consider the possibility of nonlinearities for records having PGA values of less than 100 cm/s2 but higher than our set maximum PGA value for linear cases (that is, 15 cm/s2), especially in the range of 50–100 cm/s2. Figure 4 shows the relationship, which is extracted from all applied recordings, between epicentral distances and MJMA in three different PGA ranges. The number of events for each acceleration range is shown in the bottom right corner on the left panel of Fig. 4. This study considered weak motions with an epicentral distance of less than 100 km, recorded from Jan 1, 1999 to Jan 1, 2019. Because strong-motion recordings are much rarer than weak motion recordings, we did not set any restriction on the epicentral distances for strong motions; the search years for strong motions were also expanded to Jan 1, 1996 to Jan 1, 2020. As for the moderate motions (15–100 cm/s2), when a sufficient number of records could not be found at a given site in the distance range of 0–100 km from Jan 1, 1999 to Jan 1, 2019, the constraint for epicentral distances was discarded and matching recordings for this site were searched again; however, the selected years were not changed.

Fig. 4
figure 4

Relationship between epicentral distances and MJMA with different PGA ranges

2.2 HVSR calculation process

Before extracting recordings in time domains, we corrected the baseline for each record based on the average value calculated using the first 500 values. A data fragment of 24 s from each recording, including the main S-wave portion of 20 s duration, and two pieces of 2 s before or after the S-wave portion for using cosine-shape tapers, was adopted to calculate HVSRE. To derive the S-wave arrival time, we extracted the record starting time from the raw waveform.data, coupling it with the average S-wave travel-timetable for Japan (Ueno et al. 2002) and the precise earthquake location information obtained from the seismicity analysis system (SAS; Tsuruoka 1998). The recordings of two horizontal directions (i.e., EW and NS) and one vertical direction (i.e., UD) were considered for each motion. The fast Fourier transform (FFT) was implemented in the time–frequency analysis together with the Parzen window with a band of 0.1 Hz for weak motions. Herein, the HVSR LE at a given station was derived by averaging all the HVSREs calculated from each weak motion at this site. This average process can be seen as an extra step for smoothing the spectra in addition to the Parzen window. Because we aimed to explore the relationship between the nonlinear soil characteristics and the intensity factor of earthquakes such as PGA or PGV, the HVSRE for strong motions should be treated individually, indicating that there is no averaging procedure for strong motions. Due to this lack of intermediate processes, a Parzen window of 0.2 Hz was used to calculate HVSR NE for strong motions. In summary, there is only one HVSR LE calculated at a given site by averaging several HVSREs for weak motions; however, the number of HVSR NE depends on the number of strong-motion recordings. As for the methodology for directional averaging, root-mean-square (RMS) values were utilized to average the HVSRE of the NS and EW directions acquired from one event; geometric averaging was used for those weak motions at the same station to calculate the HVSR LE at this location.

2.3 Degree of difference and goodness-of-fit

We adopted the degree of difference (DOD), Pearson product-moment correlation coefficient (PPMCC), and goodness-of-fit (GOF) to calculate the coincidence degree in the frequency range of interest (i.e., 0.5–20 Hz) between HVSR NE and the HVSRE derived by moving HVSR LE by a given pair of α and β (HVSR TE ). Equations (1)–(6) describe these factors.

$$\begin{array}{*{20}c} {\overline{R} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} c\left( i \right) \cdot \log_{10} R\left( i \right)}}{{\mathop \sum \nolimits_{i = 1}^{n} c\left( i \right)}}} \\ \end{array}$$
(1)
$$\begin{array}{*{20}c} {\sigma_{R} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} c\left( i \right) \cdot \left( {\log_{10} R\left( i \right) - \overline{R}} \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} c\left( i \right)}}} } \\ \end{array}$$
(2)
$$\begin{array}{*{20}c} {COR\left( {R_{1} ,R_{2} } \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{n} c\left( i \right) \cdot \left( {\log_{10} R_{1} \left( i \right) - \overline{{R_{1} }} } \right) \cdot \left( {\log_{10} R_{2} \left( i \right) - \overline{{R_{2} }} } \right)}}{{\sigma_{{R_{1} }} \cdot \sigma_{{R_{2} }} \cdot \mathop \sum \nolimits_{i = 1}^{n} c\left( i \right)}}} \\ \end{array}$$
(3)
$$\begin{array}{*{20}c} {HVSR_{E}^{T} \left( {f\left( i \right)} \right) = HVSR_{E}^{L} \left( {f\left( i \right) \cdot 10^{\alpha } } \right) \cdot 10^{\beta } } \\ \end{array}$$
(4)
$$\begin{array}{*{20}c} {DOD = \frac{{\mathop \sum \nolimits_{i = 1}^{n} c\left( i \right) \cdot \left| {\log_{10} \frac{{HVSR_{E}^{T} \left( {f\left( i \right)} \right)}}{{HVSR_{E}^{N} \left( {f\left( i \right)} \right)}}} \right|}}{{\mathop \sum \nolimits_{i = 1}^{n} c\left( i \right)}}} \\ \end{array}$$
(5)
$$\begin{array}{*{20}c} {GOF = \frac{{DOD_{\min } \cdot COR\left( {\alpha ,\beta } \right)}}{{DOD\left( {\alpha ,\beta } \right) \cdot COR_{\max } }}} \\ \end{array}$$
(6)

In these equations, R, R1, and R2 can be HVSR NE or HVSR TE . c(i), whose value is the reciprocal of frequencies at the i-th point, is a coefficient to balance the weight attributed to the different frequency increments on the logarithmic frequency axis (because of equal sampling in the linear frequency axis in FFT). \(\overline{R}\) is the average of R. \(\sigma_{R}\) is the standard deviation (STD) of R. COR is equivalent to the weighted PPMCC of R1 and R2. DOD can be regarded as the average distance in the logarithmic system between HVSR NE and the HVSR TE . If both α and β are zero, the HVSRE will be the same as HVSR TE . f(i) is the frequency at the i-th point. The GOF, whose maximum value is one, can be seen as the normalization process for DOD and COR, implying that the greater the GOF is, the higher the coincidence degree (or similarity) between the associated two spectra would be. Note that the weight coefficient c(i) is supplemented in all the equations.

As shown in Fig. 5a, the grid search strategy was utilized to find the best pair of α and β, namely, the values of α and β for which the GOF value is maximized. In Fig. 5a, the brighter the area is, the greater the associated GOF value would be. The dotted red line represents the best fit values of α for a certain β value. The larger red dot indicates the α and β values for which the GOF value is maximized. This happens at α = 0.075 and β = 0.168. This optimal pair of α and β will be denoted as αb and βb hereafter. In Fig. 5b, the thick black line depicts a plot of HVSR LE for IWT008 station; the red line depicts HVSR NE calculated from the record “IWT0080112022202” with PGA = 167.08 cm/s2, and the thick purple line depicts a plot of HVSRE derived via shifting HVSR LE using αb and βb, namely, HVSR BE . Herein, we showed two types of HVSR NE using the Parzen window with different bandwidths. The light red line represents HVSR NE calculated using a Parzen window with a band of 0.2 Hz. This HVSR NE was utilized in the matching calculation process for determining the optimal pair of α and β. However, checking the coincidence degree by visual inspection using this HVSR NE was difficult due to its high fluctuation. Therefore, HVSR NE calculated using a Parzen window with a band of 0.8 Hz was plotted using a thick bright red line to confirm the coincidence degree. The purple and red lines agree well with each other. Although there are still some discrepancies between these two lines, the methodology proposed here for reproducing HVSR NE based on the shift in the frequency and amplitude axes should be effective and acceptable.

Fig. 5
figure 5

Grid search for GOF and three types of HVSRE; a Values of GOF for different α and β values; the red dot indicates that the GOF value reaches the maximum value of one when α = 0.075 and β = 0.168. b Three types of HVSRE; HVSR NE and HVSR LE are denoted in red and black, respectively. When α = 0.075 and β = 0.168, HVSRT E is the HVSRE with the best coincidence (HVSRB E), exactly. “IWT0080112022202” means that this is an example of the event at 22:02, on December 2, 2001, at IWT008 site

3 Nonlinear correction function

3.1 Nonlinear correction function without considering site classification

Figure 6 depicts the αb and βb values for each record with a PGA greater than 15 cm/s2, the bin central values based on PGA, and the regression curve for αb and βb. For instance, the αb and βb for “IWT0080112022202,” which are shown in Fig. 5, are plotted as one of the blue dots in Fig. 6a and Fig. 6b, respectively. Figure 6 plots 43,149 recordings in the PGA range of 15–50 cm/s2, 7,520 recordings in the range of 50–100 cm/s2 and 5,319 recordings in the range of 100–2500 cm/s2, that is, a total of 55,988 recordings from 1,065 stations. The αb and βb for records with a PGA value in the range of 4–15 cm/s2 are not shown because their PGA is very small and it is unnecessary to show all of them in the linear cases. Note that the PGA adopted here is the resultant value of the NS and EW directions. This means that we first synthesized the waveforms of NS and EW components based on the resultant rule; then, the greatest acceleration value on the synthesized waveforms was extracted as the PGA for this event. From 15 cm/s2 to 2500 cm/s2, the horizontal axis was separated into several bins based on intervals of approximately 1.3 folds, as shown in Table 1. The average αb increases slightly and stably from the first bin (i.e., 15–20 cm/s2) to the fifth bin (i.e., 41–50 cm/s2), but it is still considerably small, even in the range of 41–50 cm/s2. This indicates that the nonlinear soil behavior was not detected in the range of 15–50 cm/s2. However, we deduced that recordings of 50–100 cm/s2 should be considered together with those of 100 cm/s2 or higher, considering that the slope of the regression curve for αb was fairly greater than zero in the PGA range of 50–100 cm/s2. As for βb, although the average βb is greater than the average αb in the range of 15–50 cm/s2, this region was not included in the curve regression for βb. The positive and ascending βb in the range of 15–50 cm/s2, which implies a positive amplitude increment on average, might be explained as a common phenomenon as PGA increases, even in linear cases. Another plausible explanation is that the proportion of noise components in the spectral ratios decreases as PGA increases, and then the average amplitude value of HVSR increases. In this case, the average amplitude of the pure noise signal should be less than that of the HVSR without any noise. The reason for the positive βb in the range of 15–50 cm/s2 should be investigated in detail in the future.

Fig. 6
figure 6

Values of a αb and b βb for each record with a PGA greater than 15 cm/s2, the bin central values for a αb and b βb based on PGA, and the regression curve for a αb and b βb

Table 1 Range of PGA, number of recordings, average PGA, αb and βb for each bin

The least squares method (LSM) was used to search for the optimal regression formula for αb and βb. As shown in Fig. 6a, a parabolic function was utilized in the regression procedure for αb. When PGA was identical to 50 cm/s2, the function value was restricted to the average αb in the range of 49–51 cm/s2, which is equal to 0.0105. This process can be seen as the baseline correction. The function slope at PGA = 50 cm/s2 was restricted to be zero. In our datasets, many recordings are stored using the sampling frequency of 100 Hz. In the case of 100 Hz, the Nyquist frequency of FFT is 50 Hz. As shown in Fig. 2 in the manuscript, the frequency range of interest in HVSR is 0.5 to 20 Hz. The shift of 0.4 for α at 20 Hz means that the data at 10(log10(20)+0.4) = 50.23 Hz, which has already been greater than the Nyquist frequency, will be utilized in the matching procedures. The shift of -0.4 for α at 0.5 Hz means that the data at 0.2 Hz will be considered. In most cases, we deem that 0.2 Hz is small enough to consider the potential nonlinear soil behaviors. Therefore, the range of ± 0.4 for α would be appropriate as the upper and lower bounds of the grid search. In the case of β, the shift of positive and negative 0.4 means that the amplitude will be enlarged or reduced by 10^0.4 = 2.51 folds. We deem that the amplification or de-amplification of 2.51 folds is enough to consider the variety on HVSR during nonlinear soil behaviors. Under these constraints, the R2 value between the regression curve and the bin central values of αb is 0.9811, indicating the goodness of the fit. Given that the βb values for a PGA larger than 1000 cm/s2 cannot fit the parabolic function under our restriction for PGA = 50 cm/s2, we applied piecewise functions to describe the relationship between PGA and βb. As shown in Fig. 6b, we used different parabolic functions, which are depicted in red and orange, to fit the βb in the PGA ranges of 15–500 cm/s2 and 500–2500 cm/s2, respectively. At PGA = 50 cm/s2, the function slope for β was also restricted to be zero, and the correction value for βb was 0.0289. The β value and the first-order derivative values of the former and latter functions should be identical at PGA = 500 cm/s2. The fitting effectiveness for βb is favorable, but the R2 value is 0.7427, lower than the R2 of 0.9811 for αb, because the bin central value of βb is relatively stable and close to its average value. One can realize that the αb value and its slope increase continuously as the PGA value increases, which can be interpreted as the shift towards a lower frequency in the entire frequency range of interest (i.e., 0.5–20 Hz), rather than only at the fundamental or predominant frequency. A non-negligible deviation from zero on βb at PGA = 50 cm/s2 occurred in Fig. 6b. The βb value remains greater than zero, increases slightly and continuously in the PGA range of 50 cm/s2 to approximately 700 m/s2, and then descends to zero at the PGA of approximately 1850 cm/s2. To some extent, the positive βb value could represent a positive increment of average amplitudes, from HVSR LE to HVSR NE . This implies that the positive influence caused by the reduction in shear soil modulus, which contributes to the reduction in VS and the rise in the average amplitude, overcomes the negative impact caused by the increase in soil damping in the PGA range of 50–1850 cm/s2. More precisely, the rise in the average amplitude can be attributed to the higher impedance contrast in the interface of two adjacent layers. This means that thick sediment is required to induce a considerable decreasing effect caused by damping in the amplitude value to offset the increasing effect led by the Vs reduction in the PGA range of 50–1850 cm/s2. On the other hand, if the PGA value is greater than 1850 cm/s2, the fitted curve for βb shows that the impact led by the Vs reduction will be countered by the impact attributed to the increase in damping.

As shown in Figs. 7 and 8, the concentration degree of αb is greater than that of βb for those bins with sufficient recordings (i.e., 50–430 cm/s2), gradually descending to the same level as that of βb in the PGA band of 430–560 cm/s2. There is no significant change in the concentration degree of βb in the range of 50–430 cm/s2 that includes eight bins. For bins of PGA larger than 560 cm/s2, both αb and βb gradually become more scattered as PGA increases due to fewer data points in this range (see Table 1). In the future, the regression curve for αb and βb will be adjusted when many new data with a PGA value greater than 560 cm/s2 can be downloaded. Note that the scatter of αb seen in Fig. 6a is misleading because many overlapping dots concentrating near the regression curves cannot be distinguished visually.

Fig. 7
figure 7

Distribution of αb in each bin

Fig. 8
figure 8

Distribution of βb in each bin

The α and β calculated using regression functions were denoted αc and βc. Based on the regression we obtained:

$$\begin{array}{*{20}c} {\alpha_{c} = 0.1077 \cdot \left( {\log_{10} PGA - \log_{10} 50} \right)^{2} + 0.0105,\quad 50 \le PGA \le 2500} \\ \end{array}$$
(7)
$$\begin{array}{*{20}c} {\beta_{c} = \left\{ {\begin{array}{*{20}l} {0.0310 \cdot \left( {\log_{10} PGA - \log_{10} 50} \right)^{2} + 0.0289,} \hfill & {50 \le PGA < 500} \hfill \\ { - 0.2918 \cdot \left( {\log_{10} PGA} \right)^{2} + 1.6369 \cdot \log_{10} PGA - 2.2328,} \hfill & {500 \le PGA \le 2500} \hfill \\ \end{array} } \right.} \\ \end{array}$$
(8)

Equations (7) and (8) are the regression curves for αb and βb shown in Fig. 6, representing the relationship between PGAs value and the shift in the frequency and amplitude axes. Using Eq. (9), we then obtain the pseudo HVSR NE at a given site by moving the HVSR LE at this site based on αc and βc (HVSR CE ). This means when the α and β in Eq. (4) are equal to αc and βc, HVSR TE is exactly equal to HVSR CE . Equations (7)–(9) are the improved NCFs.

$$\begin{array}{*{20}c} {HVSR_{E}^{C} \left( {f\left( i \right)} \right) = HVSR_{E}^{L} \left( {f\left( i \right) \cdot 10^{{\alpha_{c} }} } \right) \cdot 10^{{\beta_{c} }} } \\ \end{array}$$
(9)

Figure 9 shows two reproduction examples. HVSR CE is shown using a thick blue line. The cyan area in Fig. 9 is the confidence interval. The upper and lower boundaries of the cyan area depend on the maximum surrounding area of numberless spectra derived using any values in a possible range of α and β. This means that the (α, β) for the cyan area can be any value within a rectangular range surrounded by four boundary pairs of α and β. The four vertexes values of (α, β) are (αc + STD of αb, βc + STD of βb), (αc – STD of αb, βc – STD of βb), (αc + STD of αb, βc – STD of βb), and (αc – STD of αb, βc + STD of βb), respectively. As shown in Fig. 9a, the HVSR NE , HVSR BE , and HVSR CE agree well with each other, which indicates that the blue dot corresponding to this recording in Fig. 6 is very close to the regression curve; in other words, the αb and βb values based on the best fit are fairly close to those calculated by NCF. However, the agreement between HVSR BE and HVSR CE in Fig. 9b is not as close as that in Fig. 9a. This is because the blue dot corresponding to this record in Fig. 6 is not close to the regression curve, being located at the distribution periphery of the blue points. The blue point related to the record shown in Fig. 9b should be close to the sum of bin central values and one STD because the upper boundary shape of the cyan area is close to the HVSR BE shown in purple. In view of the nature of regression approaches and the fluctuation of seismological recordings, the discrepancy between HVSR BE and HVSR CE is unavoidable for those recordings whose blue dots are far from the regression curve. We also calculated the average residual between HVSR NE and HVSR CE , which is shown on the left side of each panel. From among two types of HVSR NE , the spectrum calculated using a Parzen window of 0.8 Hz, namely, the bright thick red line, was adopted to assess the average distance between HVSR NE and HVSR CE (the light thin red line is the spectrum calculated using a Parzen window of 0.2 Hz).

Fig. 9
figure 9

Four types of HVSRE, namely, the HVSR LE , HVSR NE , HVSR BE , and HVSR CE ; a Record “IWT0091404080822”, which is detected at 08:22, on April 3, 2014, at IWT009, b Record “IWT0080112022202”, which is detected at 22:02, on December 2, 2001, at IWT008 site

We performed the reproduction procedures for all the data with a PGA greater than 50 cm/s2, listing the double-average residuals between HVSR NE and HVSR CE for each bin and related STD in Table 2. The average residual value increases smoothly as PGA increases (The “double-average” means first averaging each data point of one recording and then averaging the already averaged results of all the recordings in a bin). The STD of residuals in bins having more than 100 recordings was approximately 0.043, except for the bins in the range of 190–250 cm/s2, where the STD is quite high at 0.071. The STDs of the last four bins are comparably greater than those of the bins in the range of 50–190 cm/s2 and 250–560 cm/s2. The larger STD might be attributed to the insufficiency of the number of recordings in the last four bins. As for the range of bins, the ratio of the maximum PGA to minimum PGA in each bin is controlled between 1.2 to 1.3, except for the last bin. The last bin considered the PGA range of 1230 to 2500 cm/s2 because of the insufficiency of the number of events.

Table 2 Range of PGA, average residuals, and their STDs

3.2 Nonlinear correction function considering site classification

Subsequently, we separated all the selected 1065 stations into five site categories based on the National Earthquake Hazards Reduction Program (NEHRP; BSSC 2003), as shown in Table 3. The NCFs considering different site classes were acquired by using the same procedures as those followed without considering site classification. Figures 10 and 11 show the regression curves of αb and βb for site classes B, C, D, and E. The regression function slope value at PGA = 50 cm/s2 was also restricted to be zero. To correct the deviation from α (or β) = 0 at PGA = 50 cm/s2, the mean value of αb (or βb) averaged from αb (or βb) in the PGA range of 45–55 cm/s2 was used. In the case of α, a single parabolic function was utilized for site classes B, C, and E in the regression procedure; the piecewise functions consisting of two parabolic functions were adopted for site class D to well fit the recordings with PGA values greater than 500 cm/s2. In the case of β, initially, we decided to adopt the aforementioned piecewise functions for every site class. The boundary point for the two parts of the regression functions was set at 1000 cm/s2 for site class B, 500 cm/s2 for site classes C and D, and 200 cm/s2 for site class E. However, given that there is no recording with a PGA greater than 1000 cm/s2 in site class B, we chose the single parabolic type for Category B.

Table 3 NEHRP site class
Fig. 10
figure 10

Values of αb, regression curves, and bin central values for records with PGA greater than 15 cm/s2 in a Site class B, b Site class C, c Site class D, and d Site class E

Fig. 11
figure 11

Values of βb, regression curves, and bin central values for records with PGA greater than 15 cm/s2 in a Site class B, b Site class C, c Site class D, and d Site class E

The R2 value of α is superior to that of β in every site class, indicating that the efficacy of our regression strategy is more favorable in the case of α, in comparison to β. The R2 value of α can be accepted in every site class, and the value of β should be acceptable in site classes C and D. On the other hand, one should be careful to use the NCFs for β in site classes B and E due to their low R2 values. In our opinion, compared with site classes C and D, the relatively low recording density (especially in the high PGA range) led to a lower resolution for βb and even αb in site classes B and E, negatively affecting our regression strategy and causing low R2 values. We anticipate that if more recordings are collected in the site classes B and E, the features αb and βb will be more evident and the R2 value on the basis of our strategy will be greater in these two categories. In addition, the STDs in some bins are quite large owing to the unclear characteristics of αb and βb in these bins, which might be attributable to recording insufficiency.

$$\begin{array}{*{20}c} {\alpha_{cb} = 0.1271 \cdot \left( {\log_{10} PGA - \log_{10} 50} \right)^{2} + 0.0157,\quad 50 \le PGA \le 900} \\ \end{array}$$
(10)
$$\begin{array}{*{20}c} {\alpha_{cc} = 0.1029 \cdot \left( {\log_{10} PGA - \log_{10} 50} \right)^{2} + 0.0112,\quad 50 \le PGA \le 2500} \\ \end{array}$$
(11)
$$\begin{array}{*{20}c} {\alpha_{cd} = \left\{ {\begin{array}{*{20}l} {0.0998 \cdot \left( {\log_{10} PGA - \log_{10} 50} \right)^{2} + 0.0101,} \hfill & {50 \le PGA < 500} \hfill \\ {0.5509 \cdot \left( {\log_{10} PGA} \right)^{2} - 2.7744 \cdot \log_{10} PGA + 3.5845,} \hfill & {500 \le PGA \le 1600} \hfill \\ \end{array} } \right.} \\ \end{array}$$
(12)
$$\begin{array}{*{20}c} {\alpha_{ce} = 0.1237 \cdot \left( {\log_{10} PGA - \log_{10} 50} \right)^{2} + 0.0128,\quad 50 \le PGA \le 1500} \\ \end{array}$$
(13)
$$\begin{array}{*{20}c} {\beta_{cb} = 0.0153 \cdot \left( {\log_{10} PGA - \log_{10} 50} \right)^{2} + 0.0355, \quad 50 \le PGA \le 900} \\ \end{array}$$
(14)
$$\begin{array}{*{20}c} {\beta_{cc} = \left\{ {\begin{array}{*{20}l} {0.0242 \cdot \left( {\log_{10} PGA - \log_{10} 50} \right)^{2} + 0.0310,} \hfill & {50 \le PGA < 500} \hfill \\ { - 0.1970 \cdot \left( {\log_{10} PGA} \right)^{2} + 1.1121 \cdot \log_{10} PGA - 1.5110, } \hfill & {500 \le PGA \le 2500} \hfill \\ \end{array} } \right.} \\ \end{array}$$
(15)
$$\begin{array}{*{20}c} {\beta_{cd} = \left\{ {\begin{array}{*{20}l} {0.0382 \cdot \left( {\log_{10} PGA - \log_{10} 50} \right)^{2} + 0.0259,} \hfill & {50 \le PGA < 500} \hfill \\ { - 0.5513 \cdot \left( {\log_{10} PGA} \right)^{2} + 3.0523 \cdot \log_{10} PGA - 4.1580,} \hfill & {500 \le PGA \le 1600} \hfill \\ \end{array} } \right.} \\ \end{array}$$
(16)
$$\beta_{ce} = \left\{ {\begin{array}{*{20}l} {0.0251 \cdot \left( {\log_{10} PGA - \log_{10} 50} \right)^{2} + 0.0399,} \hfill & {50 \le PGA < 200} \hfill \\ { - 0.0339 \cdot \left( {\log_{10} PGA} \right)^{2} + 0.1864 \cdot \log_{10} PGA - 0.2003,} \hfill & {200 \le PGA \le 1500} \hfill \\ \end{array} } \right.$$
(17)

Equations (10)–(13) represent the regression functions for αb in site classes B, C, D, and E. Equations (14)–(17) are the regression functions for βb in site classes B, C, D, and E. Equation (9) are also utilized to decide the amplitude and frequency value for each data point after correction, with αc and βc replaced by the special symbol for each site class, such as αcb and βcb in the case of site class B. Table 4 shows the double-average residuals and STDs for each PGA bin between HVSR NE and HVSR CE calculated using the NCF considering site classes. The residual and related STD values did not vary significantly in comparison to Table 2, which is representative of the NCF without considering site categories. Therefore, we deem that either NCF based on VS30, irrespective of consideration of site classes, can be utilized to reproduce the HVSRE in nonlinear cases. In addition, some examples using NCF considering site classes are shown in Fig. 12. Given that the recordings for these examples have fairly large PGA values ranging from 500 to 1500 cm/s2, Fig. 12 could validate our NCFs considering the site classes in high PGA cases because our NCFs provide favorable reproductions.

Table 4 Range of PGA, Average residual, and their STD in the case considering site class
Fig. 12
figure 12

HVSR LE , HVSR NE , HVSR BE , and HVSR CE derived by using NCF considering site classes in the case of a Record “IWTH230807240026”, which is detected at 00:26, on July 24, 2008, at IWTH23 b Record “OITH111604160125”, which is detected at 01:25, on April 16, 2016, at OITH11 c Record “IWT0210407091954”, which is detected at 19:54, on July 9, 2004, at IWT021 and d Record “NIG0190410231756”, which is detected at 17:56, on October 23, 2004, at NIG019

4 Horizontal site amplification factor considering soil nonlinearity

As shown in Eq. 18, VACF, of which concept was first proposed by Kawase et al. (2018) and then systematically analyzed by Ito et al. (2021), can conveniently transform HVSRE into HSAF.

$$\begin{array}{*{20}c} {HSAF = \frac{{V_{s} }}{{H_{b} }} \times \frac{{H_{s} }}{{V_{s} }} = VACF \times HVSR} \\ \end{array}$$
(18)

The essence of VACF is the average ratio of vertical FAS on Earth’s surface with respect to horizontal FAS at the seismological bedrock (VSHBR). If the amplification factor on the vertical component does not vary significantly during the strong shakings, the VACF in linear cases can also be adopted in nonlinear cases. This study utilized an improved version of VACF, namely, VACFVS30. Ito’s VACF involved all the known VSHBR values, irrespective of the VS30 value associated with them (namely, the VS30 value at the site where this VSHBR was detected). However, VACFVS30 considers only a certain number of VSHBR that have VS30 values close to those of the target station.

Figure 13 shows six examples of deriving HSAFN from HVSR CE . Panels a and b are related to the two cases shown in Fig. 9. Panels c, d, e, and f correspond to the four cases described in Fig. 12. Thus, HVSR CE was calculated using the NCF without site categories in Panels a and b, whereas the NCFs considering site categories were utilized to calculate the HVSR CE shown in Panels c, d, e, and f. The difference between HVSR CE and HSAFN is significant and can be attributed to the amplification of the vertical component from seismological bedrock to Earth’s surface. This discrepancy is comparably greater at sites with lower VS30 values, such as NIG019, IWT021 and OITH011, than at sites with higher VS30 values, such as IWT008 and IWTH23. If HVSR CE is used directly as the horizontal amplification from seismological bedrock to the surface, there will be some frequency ranges for which amplitudes are less than one, such as 4–15 Hz at site NIG019 and 9–15 Hz at site OITH11. This means de-amplification of the horizontal component, which should not be acceptable in terms of the basic theory of earthquake engineering. After the correction by VACFVS30, namely, considering the amplification on the vertical component, this kind of de-amplification can be mitigated. We suggest that the HSAFN calculated using the hybrid approach of NCF and VACFVS30 is more appropriate than using a series of HVSR, especially HVSR CE and HVSR LE , to characterize the site effect in nonlinear cases. Please note that the standard deviation of VACFVS30 is increasing as the frequency gets higher. This tendency reflects the fact that peak resonant frequencies of the vertical component in the high-frequency range vary significantly due to the P-wave velocity variations in shallow layers at the sites with similar VS30 values.

Fig. 13
figure 13

HVSR CE , HSAF in nonlinear cases (HSAFN), VACFVS30, and every VSHBR (i.e., light golden line) involved in VACFVS30; for each panel, 100 VSHBRs were acquired from 100 different sites and geometrically averaged to obtain the VACFVS30 for the target site, namely, a IWT009, b IWT008, c IWTH23, d OITH11, e IWT021, and f NIG019. These 100 sites have VS30 values closest to that of the target site

As shown in Fig. 14, in the frequency range of 0.5–6 Hz, the VACFVS30 value with a higher VS30 value is less than with a lower VS30 value. This means the vertical component amplification occurring on the soil with a lower VS30 value, from the seismological bedrock to Earth’s surface, is more significant than that with a greater VS30 value. Such phenomena will be sometimes reversed outside the frequency of 0.5–6 Hz, such as the spectrum with VS30 = 333 m/s in the frequency range of 6–15 Hz and with VS30 = 400 m/s in the range of 9–15 Hz; namely, the VACFVS30 amplitudes with greater VS30 values are greater than those with less VS30 values. We provided all the spectra and their STDs as supplementary material for this article (see Data availability) in order that people can conveniently use Eq. (18) to transform HVSRE into HSAF under the different soil condition with different VS30 values.

Fig. 14
figure 14

VACFVS30 in the cases of different VS30 values; each VACFVS30 spectrum was calculated by averaging 100 site-specific VSHBRs, whose relevant stations have VS30 values closest to the values shown in the right legend

5 Discussion

5.1 Relationship between PGV or PGV/VS30 versus α b or β b

$$\begin{array}{*{20}c} {\alpha_{c} = 0.0513 \cdot \left( {\log_{10} PGV - \log_{10} 1} \right)^{2} + 0.0093,\quad 1 \le PGV \le 200} \\ \end{array}$$
(19)
$$\beta_{c} = \left\{ {\begin{array}{*{20}l} {0.0120 \cdot \left( {\log_{10} PGV - \log_{10} 1} \right)^{2} + 0.0199,} \hfill & {1 \le PGV < 30} \hfill \\ { - 0.2835 \cdot \left( {\log_{10} PGV} \right)^{2} + 0.8729 \cdot \log_{10} PGV - 0.6247,} \hfill & {30 \le PGV \le 200} \hfill \\ \end{array} } \right.$$
(20)
$$\alpha_{c} = \left\{ {\begin{array}{*{20}l} {0.0197 \cdot \left( {\log_{10} \left( {\frac{PGV}{{V_{S30} }} \cdot 10^{5} } \right) - \log_{10} 1} \right)^{2} + 0.0029, } \hfill & {1 \le \frac{PGV}{{V_{S30} }} \cdot 10^{5} < 70} \hfill \\ {0.1304 \cdot \left( {\log_{10} \left( {\frac{PGV}{{V_{S30} }} \cdot 10^{5} } \right)} \right)^{2} - 0.4084 \cdot \log_{10} \left( {\frac{PGV}{{V_{S30} }} \cdot 10^{5} } \right) + 0.3797,} \hfill & {70 \le \frac{PGV}{{V_{S30} }} \cdot 10^{5} \le 1100} \hfill \\ \end{array} } \right.$$
(21)
$$\beta_{c} = \left\{ {\begin{array}{*{20}l} {8.9363 \cdot 10^{ - 4} \cdot \left( {\log_{10} \left( {\frac{PGV}{{V_{S30} }} \cdot 10^{5} } \right) - \log_{10} 1} \right)^{2} + 0.0242,} \hfill & {1 \le \frac{PGV}{{V_{S30} }} \cdot 10^{5} < 70} \hfill \\ { - 0.0361 \cdot \left( {\log_{10} \left( {\frac{PGV}{{V_{S30} }} \cdot 10^{5} } \right)} \right)^{2} + 0.1365 \cdot \log_{10} \left( {\frac{PGV}{{V_{S30} }} \cdot 10^{5} } \right) - 0.1017,} \hfill & {70 \le \frac{PGV}{{V_{S30} }} \cdot 10^{5} \le 1100} \hfill \\ \end{array} } \right.$$
(22)

Using Eqs. (19)–(22), in conjunction with the integral equations by Saito (1978), we converted the PGA value into PGV or PGV/VS30 values to determine the relationship of αb and βb with PGV and PGV/VS30, as shown in Fig. 15. However, because the acceleration amplitude can be directly detected by seismographs, and because using PGA is convenient for the waveform prediction involving iteration procedures based on NCF, here we solely give the functions related to PGV and PGV/VS30, not providing any reproduction cases for PGV and PGV/VS30. Please note that the first function in Eq. (22) is very similar to a flat horizontal line; namely, its slope is very close to zero. This means that the relationship between PGV/VS30 and βb in potential nonlinear cases is not apparent. In Figs. 15b and 15d, we found an interesting phenomenon that the averaged βb for the first bin (namely, 0.08–0.15 in PGV and PGV/VS30 cases) noticeably deviated from zero. Herein, positive β implies a positive increment in the average amplitude value from HVSR LE to HVSR NE . Given that the PGA values of recordings in the first bin are less than 50 cm/s2, and the level of PGV and PGV/VS30 is also the smallest among all the bins, strong nonlinear soil behaviors should not occur in these recordings. This means the average βb calculated from this range of PGV or PGV/VS30 should not deviate so much from zero. As mentioned in Sect. 3.1, a possible reason for this could be the noise on the seismological recordings. However, our understanding of this phenomenon is not very clear. This could be a direction for future research.

Fig. 15
figure 15

Values of αb and βb for each record with PGA greater than 15 cm/s2, the bin central values for αb and βb based on PGV (a, b) and PGV/VS30 (c, d), and regression curves for αb and βb based on PGV (a, b) and PGV/VS30 (c, d); Herein, the recordings were sequenced based on their PGA values and were separated into three categories shown in different colors, namely, 15–50 cm/s2, 50–100 cm/s2, and > 100 cm/s.2

5.2 Theoretical interpretation for the shift on HVSRE

Based on the diffuse-field theory (DFT; Weaver 1982; Sanchez-Sesma et al. 2008), Kawase et al. (2011) derived an implicit corollary of Claerbout’s result for a 1D layered medium (Claerbout 1968). They pointed out that the imaginary part of the Green’s function at a free surface is proportional to the square of the modulus of the corresponding transfer function for a vertically incident plane wave. If the receiver and source are both located at the same point on the free surface of 1D half space, the HVSRE can be expressed as:

$$\begin{array}{*{20}c} {HVSR_{E} = \frac{{H\left( {0,\omega } \right)}}{{V\left( {0,\omega } \right)}} = \sqrt {\frac{{2\alpha_{H} }}{{\beta_{H} }}} \frac{{\left| {TF_{1} \left( {0,\omega } \right)} \right|}}{{\left| {TF_{3} \left( {0,\omega } \right)} \right|}},} \\ \end{array}$$
(23)

where ω is angular frequency; H (0, ω) (V (0, ω)) is the FAS on the horizontal (vertical) component at ω on the surface (i.e., the receiver depth is zero), respectively; αH (βH) is the P-wave (S-wave) velocity value of the seismological bedrock; TF1 (TF3) is the transform function for the horizontal (vertical) component, respectively. Following Aki and Richards (1980), TF and its modulus can be derived as follow:

$$\begin{array}{*{20}c} {TF\left( \omega \right) = \frac{2}{{P_{11} - \left( {i\upsilon } \right)^{ - 1} P_{21} }},} \\ \end{array}$$
(24)
$$\begin{array}{*{20}c} {\left| {TF\left( \omega \right)} \right| = \frac{2}{{\sqrt {P_{11}^{2} + \left( {\frac{{P_{21} }}{\upsilon }} \right)^{2} } }},} \\ \end{array}$$
(25)

where \(\upsilon = \omega \rho_{H} c_{H}\); \(\rho_{H} \left( {c_{H} } \right)\) is the density (velocity) value of the seismological bedrock; P is a second-order matrix, defined as:

$$\begin{array}{*{20}c} {P = \left[ {\begin{array}{*{20}c} {P_{11} } & {P_{12} } \\ {P_{21} } & {P_{22} } \\ \end{array} } \right].} \\ \end{array}$$
(26)

Each term of P is related to the ω and physical properties of every layer in soil models. We recommend referring to the article by Kawase et al. (2011) for details. A simple soil model having two layers was constructed in this study. We tried to mimic the variety occurring during soil nonlinearity on soil mechanical properties by changing the VS values of the first layer of models.

As shown in Table 5, the thickness of the first layer, which played the leading role in this model in terms of the amplification value, was set to a fixed value of 30 m. The VS of the first layer was set to be 500, 400, or 300 m/s2; using these values, we calculated three pieces of theoretical HVSRE with different VS30 values. The Q value for the S-wave was set to be 45, 30, or 20. Moreover, for the first layer, the P-wave velocity (VP), density, and Q value for P-wave were set to be 1,500 m/s2, 2 g/cm3, and 45, respectively. A VS value of 3,500 m/s, the P-wave velocity (VP) of 5,500 m/s, and Q value of 90 were given to the second layer to represent the seismological bedrock well.

Table 5 Soil layer properties for the model used for calculating the theoretical HVSRE

The three calculated HVSRE are plotted in Fig. 16. The spectral peak or even the entire spectrum is shifted to a lower frequency range as the VS value decreases. On the other hand, the amplification at the frequencies of the fundamental and second peaks is enhanced as the VS value decreases. Hence, based on the theory derived by Kawase et al. (2011), the reduction in VS values, which is attributed to nonlinear soil behaviors, can be used to preliminarily interpret the shift occurring in the spectra observed during strong shakings. Even if the damping is increased due to soil nonlinearity, the peak amplitude can still be increased because of the larger impedance contrast resulting from the soil nonlinearity. Although simply using α and β cannot make the theoretical spectra with Vs values of 300 m/s and 500 m/s coincide perfectly, the theoretical spectra with different Vs and Q values show a reduction in peak frequencies and a rise in peak amplitudes, which are the same as the characteristics when using a pair of positive α and β. Herein, we solely aim to utilize DFT to explain the possibility of the increase in peak amplitudes in the case of soil nonlinearity, and such a simplified two-layer model is not sufficient to simulate the practical soil deposit situation. Using more complicated models that closely mimic the α and β effects on spectra is already beyond the scope of this article.

Fig. 16
figure 16

Theoretical HVSRE with different VS30 values for the surficial layer based on Kawase et al. (2011)

6 Conclusion

The main conclusions of this study are as follows:

First, considering not only the average distance but also the PPMCC between two spectra, we redefined the GOF, which is a factor used to quantify the coincidence degree of two spectra in the logarithmic coordinate system. Based on grid search methodology, the GOF defined here effectively captures the best pair of the frequency and amplitude modification factors for nonlinearity, namely, α and β. Second, based on seismic recordings gathered from across Japan, NCFs without and with site classification consideration were proposed. Either function was validated as feasible, and effective correction functions to reproduce the HVSRE in nonlinear cases from the HVSRE in linear cases were developed. Third, based on the concept of Kawase et al. (2018), we calculated the VACFVS30, which is an improved version of VACF (Ito et al. 2020), considering the VS30 value of the target station. Using VACFVS30, HVSR CE can be converted into HSAFN. Compared with the presently used HVSR, the derived HSAFN is more favorable for representing site characteristics in nonlinear cases. In addition, the NCF involving PGV or PGV/VS30 was presented and preliminarily analyzed. The shift of HVSRE at the frequency and amplitude axes can be explained by following the DFT of Kawase et al. (2011) for HVSRE.

By including a greater number of and more varied recordings than Wang et al. (2021), the reliability of NCFs was significantly improved. The scope of application for each NCF was specified in the associated equations. In the future, we plan to combine the site amplification factors considering nonlinear soil behaviors acquired in this study with other factors related to source and path effects to simulate the strong motion waveform involving soil nonlinearity. We plan to develop an iterative procedure, which is expected to be convergent, for predicting the seismic waveforms in nonlinear cases by using the discrepancy between the input initial PGA value and the simulated output PGA value.