1 Introduction

In the assessment of the effects of regenerated noise and vibration, great emphasis is often placed on accuracy due to prohibitive costs and practical difficulties associated with retro-fitting mitigation measures. However, solving railway vibro-acoustic problems is complex and therefore the use of safety factors or contingencies is widespread. Conversely, the use of large safety factors can be costly, too. Safety factors as great as 10 decibels have been used in some designs to account for uncertainties in the modelling [1, 2].

Many assessment frameworks for rail vibration and regenerated noise (e.g. Refs. [3,4,5]) can be loosely split into the following three subsets of (1) characterising source vibration levels, (2) estimating track-to-receiver transfer functions and (3) estimating the receivers’ dynamic responses. The evaluation of each subset is associated with uncertainty in the prediction of train vibration which makes it difficult to accurately predict train vibration and regenerated noise. In practice, rail source vibration levels are often based on vibration measurements at similar sites (‘reference spectrum’) with appropriate adjustments applied to describe the relative difference between the two sites. Source vibration levels may have to be adjusted for differences in the stiffness of rail fasteners and their spacings, the trackform, the dynamic properties of the subsoil, type of train, train speed and the curvature of the track [1, 2, 5, 6]. Changes in the rail and wheel maintenance regimes may also influence source level estimates. For example, Lawrence [7] reports 10–15 dB reductions across the 50 and 100 Hz range depending on grinding. Not only are the aforementioned adjustments associated with some uncertainty, but so is the reference spectrum itself (or force density levels in case of an FTA style assessment approach [3]). Train vibration has random characteristics with a fleet at any given site exhibiting a variation in pass-by vibration levels (e.g. inter-train variability). The inter-train vibration variability at a given site depends on the fleet’s composition. In Sydney, for example, Tangara train sets are reported to be 5 decibels higher on average than Waratahs train sets [8, 9]. Further, the inter-train variability also depends on the fleet’s wheel condition (out-of-roundness and roughness), the variability in driving conditions (train speeds, acceleration and deceleration profiles) and the axle loads (number of passengers on board or freight carried).

The variability of train vibration is reflected in the use of statistical descriptors for describing source vibration levels and assessment criteria. The statistical metric used to formulate the ground-borne noise (or regenerated noise or structure-borne noise) criterion should be consistent with the metric used to calculate the underlying source vibration levels [5]. Uncertainty and variability in the source vibration levels carry through to uncertainty and variability in the estimated vibration and ground-borne noise levels at receivers. This paper focuses on a frequency bandwidth commonly used for the assessment of ground-borne noise and examines inter-train source vibration variability and how different methods of statistical evaluation and sample size selection influence the source vibration levels adopted for an assessment.

1.1 Assessment Metric

The choice of representative train pass-by spectra should generally be guided by the assessment metric. Approval conditions in Australia typically require the 95th percentile of trains to comply with the project criteria for regenerated noise which is typically the A-weighted, maximum slow response overall sound pressure level (LAmax,slow). Current projects, for example, include Sydney Metro (NSW), Melbourne Metro (VIC), Perth City Link, Forrestfield Airport Link, Thornlie-Cockburn Link and Midland to Bellevue Extension (all WA). The same metric has been adopted for the Auckland City Rail Link project [10]. The use of 95th percentiles is often used for the assessment of tactile vibration as well [10, 11]. Accordingly, the metric of this study is the slow response time-weighted one-third octave vibration velocity spectra, Lmax,slow [12].

The presented vibration spectra are unweighted. In the event the spectra are used for the calculation of A-weighted ground-borne noise levels at receivers remote from the tracks, the spectra would change due to the effects of coupling losses, floor amplification, distance attenuation as well as A-weighting (e.g. Refs. [3, 4, 13]). The aforementioned effects will change the spectral characteristics and are not further considered in this study.

2 Datasets

In this study four large datasets on different networks in Australasia are analysed. Provided below in Table 1 is a summary of each of the vibration measurement datasets.

Table 1 Overview of datasets

At Site 1, unweighted raw acceleration in the vertical direction was recorded continuously over a period of 5 days. The measurement location was approximately 15 m from the track, and an accelerometer was attached to a peg driven into the soil. At Sites 2, 3 and 4, unweighted raw acceleration of the tunnel invert in vertical direction was recorded continuously for a 24-h period. The train speeds at these sites were estimated from the pass-by durations and known train lengths.

The focus of this study is to understand typical variability in train pass-by vibration. Accordingly, the data were not normalised with respect to speed or offset from the tracks, nor were the data split by train type. At all four sites only trains on the track closest to the sensor were included in the dataset. Further, the data have been collected over a duration of days and long term variability due to grinding cycles [7, 14] or variability associated with changes to the composition of the fleet [8, 9] would not be detectable in the datasets.

The influence of different tracking positions and associated differences in the roughness of running bands and rail roughness in absolute terms are not explored in this study as the necessary supporting data have not been available to the authors.

2.1 Datasets

For all sites, individual train pass-bys were identified and saved in separate files for the subsequent analysis steps. The pass-by data of each train were high pass filtered (5 Hz) and integrated to vibration velocities in the time domain. For the calculation of one-third octaves a frequency bandwidth 10–315 Hz was considered which adequately covers the bandwidth of interest for the assessment of ground-borne noise.Footnote 1 For each pass-by the peakhold spectrum was calculated with the term peakhold signifying that the highest level in each one-third octave band during a train pass-by has been used. The use of peakhold spectra is conservative as the highest one-third octave level in each band may occur at different times. All vibration velocities and one-third octave vibration velocity spectra are presented in units of decibels relative to a reference vibration level of 1 nm/s (ie 10–9 m/s) and referred to as dBV or decibel.

2.2 One-Third Octave Spectra

Presented in Fig. 1 are the Lmax,slow spectra of the individual trains measured at all four sites as grey lines. In this paper, all results are organised as 2 × 2 subplots whereby Site 1 is in the top left subplot, Site 2 in the top right subplot and Site 3 and Site 4 are in the bottom left and right subplots, respectively. The resulting ranges in overall Lmax,slow vibration levels are shown as bars on the right hand side, labelled “OLs”. Figure 1 also illustrates:

  • the spectra of the two trains with the highest overall levels (solid lines, circle and square)

  • the spectra of the two trains with the lowest overall levels (dotted lines, triangles)

  • as well as the train spectrum with the median overall value (dashed line, diamond).

Fig. 1
figure 1

Individual Lmax,slow spectra for the four measurement sites

The spectra at each site are consistent with the trackforms. At all four sites, the two events with the highest vibration Lmax,slow levels had overall levels within 0.5 dBV. While the two highest events at all sites had similar overall levels, the underlying one-third octave bands which contribute significantly to the overall levels differ by up to 5–10 dBV.

3 Results

3.1 Overall Vibration Velocities

Illustrated in Fig. 2 is the distribution of the overall levels, separated into bin counts using widths of 0.5 and 2 dBV. Presented in Fig. 2 also are the normal probability density functions (PDFs) if the datasets were assumed to be normally distributedFootnote 2 using mean and standard deviation. The term mean, in this case, is the arithmetic mean of decibels and the standard deviation is computed as the variation of decibel values about the arithmetic mean of decibels in the dataset.

Fig. 2
figure 2

Bin counts (light grey is 2 dBV, and dark grey is 0.5 dBV bin width)

The shapes of the histograms for Sites 2, 3 and 4 clearly show that it is more likely to expect values near the midpoints rather than the upper and lower bounds. The shape of the Site 1 histograms suggests the superposition of two distinct distributions likely arising from the different rolling stocks (XPTs and freighters) measured at this site. This dataset was intentionally not split into XPTs and freighters in order to simulate the potential assessment outcomes of a mixed fleet.

The grey lines in Fig. 3 show the cumulative density functions (CDFs) if the data were assumed to be normally distributed using the arithmetic mean of decibels and standard deviation. The black circles are the empirical CDFs (ECDFs).

Fig. 3
figure 3

Experimental CDF (circles) and normally distributed CDF

Visual inspection of both data representations suggests that the overall Lmax,slow vibration velocity decibels may be normally distributed. In all subsequent analyses, a normal distribution of decibels has been adopted for the overall decibel levels as well as the distribution of decibels within one-third octave frequency bands (refer to Sect. 3.2). Normality is discussed in Appendix A.

Table 2 lists some statistical descriptors based on the overall decibel levels. The standard deviations based on Lmax,slow range from 1.4 dBV at Site 3 to 3.9 dBV at Site 1. The considered datasets are based on unweighted velocities and they show good agreement with the values presented in Weber and Karantonis [2] who cite a combined standard uncertainty for source parameters of 2.2 dB(A) where the bracketed A indicates A-weighting. For airborne noise, Weber and Zoontjens [16] report higher standard deviations of 4.5 dB for passenger trains (ranging from 3 to 6.3 dB) and 5.1 dB for freighters (ranging from 2.9 to 8.7 dB). Table 2 also presents the 95th percentiles calculated using two methods:

  • Normal distribution: Based on the arithmetic mean of decibels and 1.65 times the standard deviation (using standard normal probability tables (e.g. Wirsching et al. [17]) the mean plus 1.65 times the standard deviation equates to 95.05%).

  • Rank: The nth train of the sorted dataset where n is the number of trains multiplied by 0.95 (in case of a non-integer, rounded up to the next integer).

Table 2 Overall decibel levels in terms of 95th percentiles

The estimated overall level using the rank method and a fitted normal distribution are generally within 0.4 dBV. Whether the fitted normal distribution or rank method over or under predicts can be inferred visually from Fig. 3. There is no consistent trend as to which method yields the higher value.

3.2 One-Third Octave-Based Results

3.2.1 Implementation of Percentiles

When calculating the 95th percentile ground-borne noise levels from direct measurements of ground-borne noise, the statistical analyses can be based on the overall level of each pass-by and a detailed knowledge of the spectral content is not required. However, in cases where ground-borne noise levels cannot be determined via direct measurement, ground-borne noise levels will need to be estimated. Detailed prediction models are usually implemented in terms of one-third octaves and the likely overall ground-borne noise levels at the receivers are calculated after applying appropriate, receiver specific gain- and loss-functions to the representative one-third octave train vibration spectrum [3,4,5, 13]. Working with one-third octave spectra adds a layer of complexity compared to working with measured overall levels.

In this study, three different methods of estimating percentile spectra are compared. They are referred to as P1, P2 and P3:

  • “P1”: All trains in the dataset are sorted by their overall Lmax,slow values. The 95th percentile train (in terms of overall value) is selected and its corresponding one-third octave spectrum is chosen as the representative 95th percentile spectrum. If the number of trains multiplied by the percentile is a non-integer, then this number is rounded up.

  • “P2”: The decibel levels in each one-third octave frequency band are sorted and the 95th percentile one-third octave band level is selected. If the number of trains multiplied by the percentile is a non-integer, then this number is rounded up. Subsequently, the corresponding overall value is calculated by logarithmic decibel summation (i.e. the addition on a linear energy basis represented on a logarithmic basis).

  • “P3”: A normal distribution of the decibel values (excluding the overall value) is determined and the statistical one-third octave band level is calculated by adding 1.65 times the standard deviation to the arithmetic mean of decibels. The spectrum’s overall value is calculated by logarithmic decibel summation.

Methods P1 and P2 utilise the nearest rank method for estimating the 95th percentile. For sample sizes smaller than 20 trains the 95th percentile in the P1 and P2 method default to the spectrum of the train with the highest overall level and the envelope of all one-third octave spectra, respectively.

The P2 and P3 are carried out in the one-third octave bands and a corresponding overall level is subsequently calculated. Accordingly, no single train actually matches the derived spectrum and the P2 and P3 spectra may be thought of as ‘synthetic spectra’.

Provided below in Fig. 4 is an analysis of the 95th Lmax,slow spectra as calculated with the P1 (squares), P2 (up triangles) and P3 (circles).

Fig. 4.
figure 4

95th Lmax,slow percentile spectra using different calculation methods (P1 squares, P2 up triangles and P3 circles)

The overall Lmax,slow levels are presented in Table 3. The P1 method selects a spectrum that was actually measured, and as expected for all four sites, this method gives the lowest levels for the 95th percentile. The P2 and P3 methods result in similar overall levels, typically within 0.5 dBV.

Table 3 Overall Lmax,slow levels in terms of 95th percentiles in decibels

The 95th percentile overall levels presented in Table 3, calculated with the P2 method and P3 method, are greater than the overall levels presented in Table 2. For Sites 1, 2 and 4, the difference is approximately 1 dBV, while at Site 3 the difference is approximately 2 dBV.

At lower frequencies, the results calculated with the P1 method are found to be well below the results calculated with the other two methods. Using the P1 method to calculate representative spectra could be an issue for the assessment of the effects of tactile vibration (1–80 Hz) and has the potential to lead to under-predicting the impacts of tactile vibration. These effects are not further considered in this study.

3.2.2 Effect of Sample Size

In terms of required sample size, the second highest measurement in a set of 20 events is often used for compliance measurements. According to Norwegian Standard 8176 [11], a minimum sample size of 15 events is required to achieve a statistically representative dataset for the assessment of tactile vibration (1–80 Hz) and while not strictly applicable to ground-borne noise the stipulation of a minimum sample size is of interest for the content presented in this study. ISO 14837.1:2005 [5] identifies that if the results of a sample size of five trains of a generic category fall within ± 2 dB, the dataset is robust enough to form a suitable model basis. If the results fall outside this range then a larger measurement set is required. No further guidance on the size of measurement sets is provided in this standard.

The analysis methodology chosen in this study aims to capture the range of different outcomes that may be obtained if different engineers measured different datasets at the same location but during different time periods, containing different pass-bys and different numbers of pass-bys. For a given sample size ‘n’ (ie the number of consecutive train pass-bys in a subset) the 95th percentile Lmax,slow spectra were calculated for sets of consecutive trains using train 1 to n, 2 to n + 1, 3 to n + 2, etc. This approach simulates the analysis of different datasets (in this case subsets of size ‘n’ of the total dataset which consists of ‘N’ pass-bys) collected at different times. For a given subset size n consecutive trains, the number of different analysis outcomes is N − n + 1.

Illustrated in Fig. 5 is the resulting range of Lmax,slow one-third octave spectra calculated with the P3 method and corresponding overall levels for sample sizes of n = 10, 20, 50, 100 and N. The ranges in all bands and overall levels reduce with increasing sample size. For smaller sample sizes, there remains uncertainty whether the representative vibration levels are over-predicted or under-predicted (due to small sample sizes) relative to spectrum based on all train pass-bys shown by circles (which are identical to the P3 method spectra shown in Fig. 4).

Fig. 5.
figure 5

95th percentile Lmax,slow spectra for n = 10,20,50,100 and all trains (n = N) utilising the P3 calculation method

For the smallest plotted sample size of n = 10, the range of calculated overall levels is less than ± 3, ± 4, ± 2 and ± 3 dBV at Sites 1, 2, 3 and 4, respectively. However, reviewing individual frequencies, ranges of up to ± 5 dBV are observed in some frequency bands at all sites.

The change in overall values versus number of trains, and the convergence in the calculated 95th percentile for a typical measurement set have been studied in more detail. Figures 6, 7 and 8 show the range of overall values depending on the sample size used for the P1, P2 and P3 methods. The sample sizes considered are n = 5, 15 and multiples of 10 (i.e. n = 10, 20, 30, …).

Fig. 6
figure 6

Possible range of 95th percentile Lmax,slow levels as a function of the number of trains in the considered sample size n using the P1 calculation method

Fig. 7
figure 7

Possible range of 95th percentile Lmax,slow levels as a function of the number of trains in the considered sample size n using the P2 calculation method

Fig. 8
figure 8

Possible range of 95th percentile Lmax,slow levels as a function of the number of trains in the considered sample size n using the P3 calculation method

As expected, with increasing sample size the range of results for the 95th percentiles reduces and for n = N the overall 95th percentile levels equal those presented in Table 3. The different methods exhibit different dependencies on the consecutive trains in the sample n. The results spread of the rank-based methods (P1 and P2 in Figs. 6 and 7, respectively) exhibit noticeable step changes. This is a direct consequence of the rank method where a particular train (method P1) or dominant one-third octave band (method P2) can determine the 95th percentile spectrum.

Contrary, in the P3 method the range of predicted 95th percentile values reduces more smoothly. For sample sizes of 20 trains, the maximum ranges of the 95th percentiles typically reduce to less than 5 dBV. For the maximum range to be less than 2 dB, the required sample size needs to be increased substantially. Site 2 would require the highest numbers of samples with 130 to 160 samples being required. The maximum ranges are lower than those presented in Weber and Zoontjens [16] who investigated airborne noise from passenger and freight trains.

3.2.3 Speed of Convergence

The data representation chosen in Figs. 6, 7 and 8 illustrates how an increase in the number of samples ‘n’ reduces the spread of results using the P1, P2 and P3 methods. The percentage of calculated Lmax,slow values which fall within a ± 0.5 and ± 1.0 dBV-band of the value if all available train pass-bys had been used (i.e. n = N) has been calculated and the results for all three considered methods are presented in Fig. 9 for ± 0.5 dBV and in Fig. 10 for ± 1.0 dBV. The percentages do not increase steadily with increasing sample size.

Fig. 9
figure 9

Percentage of results falling within a ± 0.5 dBV

Fig. 10
figure 10

Percentage of results falling within a ± 1.0 dBV

The curves presented in Figs. 9 and 10 can be used to estimate the minimum sample size n required to fall within ± 0.5 and ± 1.0 dBV of the 95th percentile value as calculated with the whole dataset N (refer to Table 4).

Table 4 Required number of train pass-bys

4 Conclusions

In this paper, the variability of inter-train source vibration has been studied using four large datasets comprising different trackforms. The data were not normalised with respect to train speeds or offset from the tracks and the pass-bys were analysed in terms of peak-hold, slow response vibration spectra, Lmax,slow. Three different methods of calculating representative 95th percentile Lmax,slow spectra have been considered, and the effect of sample size on the 95th percentile levels was studied.

The differences in calculated 95th percentile levels for the P2 method (rank based implemented in one-third octaves) and P3 (normal distribution based implemented in one-third octaves) method when using the whole datasets were found to range from 0.1 to 0.6 dBV. For the four considered sites, there was no trend as to the P2 method or P3 method consistently returning higher or lower levels. The observed differences between the P2 and P3 methods are considered to be comparatively small and below variabilities typically observed between sites or due to gradual changes depending on grinding cycles or composition of a fleet. The predictions based on the P1 method (rank based implemented in overall values) were typically 1 to 2.5 dBV lower than predictions obtained with the P2 and P3 methods.

The use of a smaller number of trains increases the spread of results relative to result if all trains in the data set had been used. If only 5 samples are used, the potential inter-train variability was found to range from 5 to 8 dBV at the four investigated sites. For the calculated 95th percentile to be within ± 1.0 dBV of the value associated with the whole dataset, the minimum required sample sizes were found to range from 40 (Site 1, method P2) to 170 (Site 2, method P3).