1 Introduction

Tooth root bending fatigue is considered as one of the most dangerous gear failure mode because, typically, such kind of a failure results in the interruption of the power flow within the gearbox. Standards, such as ISO 6336‑3 [1] and ANSI/AGMA 2001 [2], provide designers with an analytical framework, to assess a gear train, in respect to this failure mode, by comparing the maximum tooth root stress with the limit value of the gear itself. Data of several typical materials are present within the standards (e.g. [2, 3]). Both ISO 6336‑3 [1] and ANSI/AGMA 2001 [2] present indication about the complete S‑N curve, with the slopes associated to both, the limited life region and the long life one. Those curves are defined at 1% gear failure probability (or 99% reliability) for tooth root bending fatigue. ANSI/AGMA 2001 [2] presents corrective coefficients to “shift” the S‑N curve so that it assures a certain reliability level, different from the typical. Designers working with the ISO 6336 series standard can rely on the work of Hein et al. [4] where a corrective coefficient YZ has been proposed to assess toot root bending fatigue at different reliability levels; a coefficient ZZ has been proposed for the phenomenon of pitting.

However, assuring a certain reliability requires two different things. On the one hand, it requires a proper estimation of the load; topic coved by standards such as ISO 6336‑6 [5]. On the other hand, it also requires an accurate definition of the complete gear S‑N curve, both in term of the slope of the limited life region and the long life region, as well as the associated dispersions [6]. Here, it would be better to rely on data specific to the current industrial scenario rather than relying on standardized data as they are not representative of the gears investigated. Therefore, extensive gear testing is needed even more when it is required to assure a certain reliability level after a given service life.

The available testing methodologies for the tooth root bending fatigue strength are running gears tests (e.g. [7, 8]), Single Tooth Bending Fatigue (STBF) tests (e.g. [9,10,11,12,13,14,15]), and tests on notched specimens (e.g. ISO 6336‑3 annex A [1, 16,17,18]). Amongst them, due to its higher efficiency and capability to directly work with gears, STBF tests are the most adopted.

However, STBF tests results cannot be directly applied to assess a gear pair. Indeed, as there is no rolling/sliding contact between gears, therefore, the loading condition of an STBF test rig is not representative of the real case.

Two different approaches for the treatment of STBF test data exist, so that they can be used within the existing calculation methods. They are proposed within FVA report no. 304 [9] and within the works of Rao and Mc Pherson [8, 11]. Both the authors highlight two main issues that must be considered in order to treat STBF test data:

  1. 1.

    The testing and service loading conditions are different. In the real case, the force varies both, the application point, and its magnitude, while in STBF the force is a sinusoidal load whit a fixed amplitude and a given application point.

  2. 2.

    The testing and service failures are different. In STBF test, a failed test corresponds to the breakage of some tested teeth, while in the meshing case, the failure of the specimen (i.e. the gear itself) corresponds to the failure of the weakest tooth.

While a paper by the authors investigates the effect of different loading conditions by means of an high cycle multiaxial fatigue criteria [19], this paper aims to shed more light on this latter aspect.

Stahl and Rao faced the problem with two different approaches. On the one hand, Stahl, K [9]. proposed several coefficients, based on experimental evidence: two coefficients (i.e. \(f_{p\rightarrow Z}\) and \(f_{Z\rightarrow ZR}\)) to translate STBF results to the case of a meshing gear, and a further one (i.e. \(f_{1\mathrm{\% }ZR}\)) to reach the desired reliability level, which in this case is R = 99% (i.e. a probability of failure of 1%); a further coefficient fkorr based on Rettig experimental evidence [20] is used to deal with the different loading conditions. On the other hand, Rao and Mc Pherson [8, 11] define a simple statistical relation between the teeth and the gear, and by means of a ‘fit by eye’, they translate STBF test data, corrected to the real case load, to the desired reliability level. Allowable Stress Range (ASR) diagrams are used to deal with the different loading conditions.

In order to reduce the need of predetermined experimental factors, an approach based on the Maximum Likelihood Estimation (MLE) and the statistic of extremes is proposed in this paper. MLE is an estimation technique typically applied in the determination of the S‑N curve, as it has great advantages of being able to consider, at the same time, different typology of data. Exact data (e.g. failures) and censored points are considered together, within the same calculation procedure. The latter are points of which the exact value is unknown; still, it is possible to say that their exact value would be greater (right censored, e.g. runouts) than a certain value, smaller (left censored, e.g. a failure occurred before the inspection), or between two values (interval censored, e.g. a failure occurred between two inspections) [6, 21, 22]. Indeed, MLE is a flexible tool; e.g., as its calculation procedure is independent from the test sequence, MLE allows to find endurance limits also where the staircase method cannot be used optimally for some reasons. MLE has been applied for the estimation of the S‑N curve of specimens (e.g. [23,24,25,26]) as well as in other components, e.g. welded structures (e.g. [27, 28]). Regarding gears, Krantz, T. [29] applied MLE for the estimation of the pitting S‑N curve.

Within this work, MLE is applied to estimate the experimental S‑N curve, that is the S‑N curve of the teeth, and its associated dispersion. Implementing the statistic of extremes, this first curve is then rearranged, by means of a mathematical passage, to estimate the gear S‑N curve. In other words, MLE is used to estimate the teeth distribution (i.e. the parent distribution), which is then used to estimate the gear distribution by defining the distribution of the weakest tooth among the z gear teeth. Different curves, at different reliability levels, can be estimated by simply calculating the corresponding percentile.

2 Experimental procedure

In the previous years, the authors have investigated the tooth root load carrying capacity of several types of gears via the STBF approach. Some examples include: gears for aeronautical applications [30, 31], small planetary gearboxes [32], wind turbine gearboxes [33, 34], generic applications [35] as well as additively manufactured gears [36, 37]. All the aforementioned tests have been performed on a mechanical resonance pulsator, equipped for each experimental campaign with a specific fixture for the gear specimen. Fig. 1 shows the equipment scheme adopted for the experimental campaign that has been taken as an example in this article. The equipment consists of two anvils, a fork and a centering pin. The anvils are the component that, during testing, are responsible of transmitting the load to the gear teeth and, due to the same principle involved within the Wildhaber measurements, they load the teeth in a symmetric way. That is, the contact between the anvils and the teeth occurs at the same diameter for both the spanned teeth. The last two components (i.e. fork and centering pin) are responsible for the correct positioning of the gear between the anvils during the preliminary mounting phase. During the test, the pin is removed, and a minimum compressive load is kept on the teeth to avoid undesired displacements; all the tests are performed at a load ratio R = 0.1. The experimental procedure is described in detail in the aforementioned papers (i.e. [30,31,32,33,34,35,36,37]).

Fig. 1
figure 1

STBF apparatus

The experimental campaign taken as example for the statistical elaboration of STBF test results has been performed on case hardened gears made of 20MnCr5 (additional information can be found in [38]); no mechanical surface treatment has been applied. A total number of 34 experimental points have been collected, 9 runouts at 5 million cycles have been observed. Table 1 summarizes the spur gear main data. If the results of the aforementioned experimental campaign have to be used to design/assess a gear pair with a different module, the size factor Yx must be taken into account. Yx typical values can be found in [1, 39, 40].

Table 1 Main gear geometry data

Method B of ISO 6336‑3 [1] has been used to calculate the relationship between the pulsator applied load, FP and the tooth root bending stress. As the standards always consider two meshing gears, the force/stress relationship has been calculated starting from a gear pair, defined in a way such as, for the gear specimen, the outer point of single contact is the pulsator point of load application. In other words, the outer single contact diameter den corresponds to the diameter at which the designed anvil/teeth contact occurs (as shown in Fig. 1, \(d_{en}=70.150\,mm\)). For the sake of brevity, details and brief information on the whole procedure can be found in the papers cited at the beginning of this section (i.e. [30,31,32,33,34,35,36,37]). A correction factor \(f_{\text{korr}}=0.9\) has been used to translate stress data from the STBF test loading condition to the one of the meshing gear case [9, 20].

3 Maximum Likelihood: estimation of the tooth S-N curve

Several models have been developed to describe the S‑N curve of a specimen (or of a component). They have different levels of complexity, from the more common ones, such as the ones described in [23, 24], to the more complex models, such as the decoupled model [25] and the random fatigue limit model [26]. Due to its simplicity, the model proposed by Spindel et al. [23] has been selected. As per this model, the S‑N curve can be described as:

$$\log \left(\frac{N}{N_{e}}\right)=\frac{1}{2}\left(k+k_{1}\right)\log \left(\frac{S}{S_{e}}\right)+\frac{1}{2}\left(k-k_{1}\right)\left| \log \left(\frac{S}{S_{e}}\right)\right|$$
(1)

where N is the number of cycles, S is the applied load, Ne and Se represent the knee position, k is the slope of the limited life region and k1 is the slope of the long-life region. An easy way to fit Eq. 1 to the experimental data is to consider S, or directly \(\log \left(S\right)\), as the independent variable and N, or \(\log \left(N\right)\), as the dependent variable and then fit the data by using rearranging Eq. 1 as:

$$\log \left(N\right)=\log \left(N_{e}\right)+\frac{1}{2}\left(k+k_{1}\right)\left(\log \left(S\right)-\log \left(S_{e}\right)\right)+\frac{1}{2}\left(k-k_{1}\right)\left| \log \left(S\right)-\log \left(S_{e}\right)\right|$$
(2)

However, to fit such a curve to experimental data, runouts must be considered as a point with the same meaning of a failure. That is, we are losing the information that such experimental points do not represent a failure, but a point which has survived to Ni cycle under a certain load Si. To avoid loss of such information, the parameters of Eq. 1 can be estimated via MLE. For this, it is assumed that experimental points are log-normally distributed, that is the logarithmic to the base 10 of the points follow a normal distribution, of which the PDF f(x) and CDF F(x) are described as:

$$\begin{cases} f_{\left(x\right)}=\frac{1}{\sigma \sqrt{2\pi }}\cdot e^{-\frac{1}{2}{\left(\frac{x-\mu }{\sigma }\right)^{2}}}\\ \begin{array}{ll} F_{\left(x\right)} &= \int _{-\mathrm{\infty }}^{\mathrm{x}}f_{\left(x\right)}dx=\int _{-\mathrm{\infty }}^{\mathrm{x}}\frac{1}{\sigma \sqrt{2\pi }}\cdot e^{-\frac{1}{2}{\left(\frac{x-\mu }{\sigma }\right)^{2}}}dx\\ &=\frac{1}{2}\left(1+erf\left(\frac{x-\mu }{\sigma \sqrt{2}}\right)\right)\end{array} \end{cases}$$
(3)

where erf is the error function.

MLE is based on finding the distribution parameters, in this case μ and σ, which are more likely to represent the experimental points. In other words, MLE estimates the distribution parameters that maximize the likelihood \(\mathcal{L}\) [6].

The case examined in this paper presents two kinds of data. The first ones are observed values, that are represented by specimens whose lifetime, at a certain load level, is known (i.e. a failure). The second ones are right-side censored data, that are represented by points whose lifetime, at a certain load level, is greater than a certain level, because the specimen was not broken (f.i. a runout). Therefore, \(\mathcal{L}\) can be defined as:

$$\mathcal{L}=\prod _{i=1}^{n}\left(f_{{x_{i}};\mu ,\sigma }dx\right)^{{\delta _{i}}}\cdot \prod _{i=1}^{n}\left(1-F_{{x_{i}};\mu ,\sigma }\right)^{1-{\delta _{i}}}$$
(4)

where δi is a constant equal to 1 when the ith observation is an observed value, and null in the case of a right-side censored data. To simplify the estimation procedure, the natural logarithm of \(\mathcal{L}\) is considered:

$$\ell=\sum _{i=1}^{n}\delta _{i}\left(\ln f_{{x_{i}}}\right)+\sum _{i=1}^{n}\delta _{i}n\ln \left(d_{x}\right)+\sum _{i=1}^{n}\left(1-\delta _{i}\right)\left(\ln \left(1-F_{{x_{i}}}\right)\right)$$
(5)

where ℓ is the so-called log-likelihood, \(\mathit{\ln }f_{{x_{i}}}\) represents exact data (failures), and \(\mathit{\ln }\left(1-F_{{x_{i}}}\right)\) right censored data (runouts). \(\sum _{i=1}^{n}\delta _{i}n\mathit{\ln }\left(d_{x}\right)\) is no more considered as it is a constant term and, therefore, it does not affect the maximization procedure. Anyhow, as only minimization procedures are included in the usual calculation software, distribution parameters are estimated by minimizing \(-\ell\).

As proposed by Nelson [22], it is possible to use MLE for curve fitting, independently of the chosen PDF, by considering the location parameter as a function of the independent variable; also dependency of the scale parameter on the independent variable can be considered too. In our case, we set the mean value μ equal to Eq. 2. In [23], Spindel et al. propose a constant standard deviation in the \(\log \left(\sigma _{F0}\right)\) direction. Considering that the developed model uses N as the dependent variable, and that the model presents 2 different slopes, the standard deviation in the \(\log \left(\sigma _{F0}\right)\) direction is “rotated” in the \(\log \left(N\right)\) direction. Therefore, two different standard deviations in the \(\log \left(N\right)\) directions (i.e., \(\sigma _{1,\log \left(N\right)}\) and \(\sigma _{2,\log \left(N\right)}\)) are obtained by multiplying \(\sigma _{\log \left(\sigma _{F0}\right)}\) by the absolute value of k (or k1).

However, one information obtained from the test has not been yet considered. As described in Sect. 2, the adopted STBF test configuration is symmetric. That is, the tested teeth are loaded at the same diameter, sharing the same applied load, resulting in the same nominal tooth root bending stress. Indeed, each experimental point (i.e., failure or runout) ‘hides’ an information about the survival of the other loaded tooth. In other words, due to the test configuration, each test interrupted by the failure of one of the two teeth is a failure and a survival at that number of cycles, at which the failure of the weakest tooth (among the tested pair) occurs. On the other hand, the case of a runout represents two runouts. In the case of a failure, an observed value and a right-side censored data are present (instead of a single observed value), while, in the case of a runout, two right-side censored data are present. The case of a test stopped due to the failure of both the loaded teeth will simply be considered as the failure of two teeth. Anyhow, as it implies that both teeth have the same lifetime, it is a very rare case, so it is not discussed in the below mathematical development. Other authors, like Wallin, K. [27] and Marquis, G. and Mikkola, T. [28], deal with the same problem of this topic in the field of welded joints. As in the adopted STBF test configuration, more than one possible failure point is present in their tested welded structure. Also, those authors solved the issue of having several specimens under testing at the same time and under the same load by adopting similar considerations. To consider this information, a further term, representing right censored data, is added to Eq. 5:

$$\ell=\sum _{i=1}^{n}\delta _{i}\left(\ln f_{{x_{i}}}\right)+\sum _{i=1}^{n}\left(1-\delta _{i}\right)\left(\ln \left(1-F_{{x_{i}}}\right)\right)+\sum _{i=1}^{n}\left(\ln \left(1-F_{{x_{i}}}\right)\right)$$
(6)

in this way, it is possible to consider that a right-censored term (the “second” tooth) is always present.

Fig. 2 shows a comparison of the three different procedures to estimate the S‑N curve. The curve obtained by curve fitting (i.e., fitting) and those estimated by mean of MLE as in Eq. 5 (i.e., MLE and MLE—2 teeth), in the region before the knee, estimate a similar slope; indeed, both are estimating a line intercepting a comparable set of points. However, as the fitting does not consider the physical meaning of runouts, the slope associated to the region ahead of the knee is not comparable. This results in a different estimation of the knee. Furthermore, the curve obtained with Eq. 6 likelihood formulation results to be shifted upwards due to the consideration of the survived teeth.

Fig. 2
figure 2

Teeth S‑N curve estimation comparison

4 Statistic of extremes: from tooth to gear

Once the S‑N curve of the teeth has been estimated (and therefore, the CDF), STBF result elaboration is necessary to make it representative of the gear. To do this, it is necessary to consider that, in the real case scenario, where gears are meshing and rotating, gear failure due to tooth root bending fatigue corresponds to the failure of the weakest gear tooth [8, 11, 20]. Therefore, starting from the estimated teeth S‑N curve, the gear S‑N curve is defined as the one describing the weakest tooth amongst the z gear teeth. In other words, the CDF of the teeth has to be elaborated in order to define the gear CDF, that is the CDF of the weakest tooth among the z gear teeth. It is worth underlying that in 1987, when Rettig studied the effect of the different loading conditions between STBF tests and running gears, he adopted the statistic of extreme in order to translate STBF/teeth results to the gear [20].

By means of a mathematical passage [6], statistic of extremes allows to define the CDF of the minimum value X(1)(x) over n extractions of X (the distribution of which is known) as:

$$F_{{X_{\left(1\right)\left(x\right)}}}=1-\left(1-F_{X\left(x\right)}\right)^{n}$$
(7)

where FX(x) is the parent CDF and \(F_{{X_{\left(1\right)\left(x\right)}}}\) is the extreme CDF. In other words, Eq. 7 expresses the distribution of the smallest value obtained from n extractions from a population described by the parent distribution. In the case of gear tooth root bending fatigue, we aim to estimate the distribution of the tooth with the smallest resistance (i.e., the gear CDF Fgear) over z gear teeth. For the sake of simplicity, in the case of tooth root bending fatigue, Eq. 7 can be seen as:

$$F_{\mathrm{gear}}=1-\left(1-F_{\text{tooth}}\right)^{z}$$
(8)

The above equation implies that the gear load carrying capacity depends on z. Thus, considering CDF/PDF properties and typical shapes, under the assumption that the teeth statistical parameters do not depend on z, Eq. 8 implies that the difference between the SN curves for two gears with different teeth numbers, is almost proportional to the logarithm of z.

According to ISO 6336‑5 [3], the typical failure probability for gear failure due to tooth root bending fatigue is 1% (i.e. a reliability of 99%), hence it is required to define the curve at the required reliability level. In other words, it is required to estimate the load carrying capacity with the 99% percent probability of being exceeded of the weakest tooth over z teeth. In order to do so, the corresponding percentile is calculated for different stress levels, i.e., finding for a given stress level, the N value that allows Fgear to be equal to the desired failure probability.

Fig. 3 reports the estimated S‑N curve at different reliability levels. The final curve (i.e., the curve at 1% failure probability for the gear) is calculated as the 1% percentile of the gear PDF. For comparison, the 50% percentile gear curve, as well as the initial 50% percentile teeth curve are included. 95% bilateral confidence intervals have been calculated using the quadratic approximation [6, 22].

Fig. 3
figure 3

Estimated gear S‑N curve

It is worth underlining that the estimated fatigue knee is around 120,000 cycles, a number of cycles different than the one proposed by ISO 6336‑3 (i.e., 3 million of cycles). There are two reasons behind such a difference. Firstly, the knee estimated here is simply an intersection between the two lines defined in Eq. 1; in other words, it is a curve parameter. It does not have the same meaning of the fatigue knee obtained looking also at the fatigue limit because the fatigue limit has statistical meanings that are not included within the Spindel model. Secondly, ISO 6336‑3 describes general gear material properties, and it is reasonable to assume that certain gear will show a different position of the fatigue knee; indeed, [8, 9, 11, 20] show fatigue knee positions different from the one proposed by ISO.

5 Results and conclusion

Even if ISO 6336‑3 [1] and ISO 6336‑5 [5] allow to estimate the gear tooth root bending fatigue strength by means of STBF tests, they do not give all the details to elaborate STBF test results. Here, a statistical framework through which analyse STBF test data in order to translate them to the gear specimens, in meshing condition, has been proposed. With this methodology, the data elaboration is based only on the experimental data and does not require predefined coefficients.

Fig. 4 show the estimated percentiles for the teeth as well as the gear. As can be seen, not considering the effect of the different failure conditions (i.e. predetermined for STBF and weakest for meshing gears) implies an overestimation of the tooth root bending fatigue curve. The curve referred to the gear shows a lower carrying capacity than the one referred to the teeth. Furthermore, meshing gear percentiles, shown in Fig. 4, are closer to each other rather than those of STBF. Indeed, extreme value PDFs are narrower than the parent one [6].

Fig. 4
figure 4

Estimated S‑N curves

6 Nomenclature

The nomenclature is shown in Table 2.

Table 2 Nomenclature