1 Introduction

The durability and life of gears is critical to the safe operation of power transmission systems they are used in. Long life of these components reduces costs associated with maintenance, repair, and warranty. One well known failure mode of gears is tooth root bending fatigue. This failure mode is the result of two main operating principles of gears. The first, is that gear teeth act as cantilever beams when supporting load. This creates a stressed volume at the root, which is a function of the applied load and root geometry. Secondly, the gear tooth is loaded only while in mesh and is in a relaxed state outside the mesh, producing cyclic stress states in the gear teeth roots where the frequency of loading is dependent on gear rotational speed and kinematic configuration of the gear train.

Gear tooth bending fatigue is a catastrophic failure where almost instantaneously the load carrying capacity of the gear is extinguished as one tooth is separated and poor meshing conditions cascades the failure to the rest of the gear teeth. In order to prevent this failure, gear designers must either have an accurate set of predictive failure models or extensive experimental data, else revert to overly large factors of safety producing heavy and costly designs that may underperform. The current state of available predictive models for high cycle or very high cycle gear tooth bending fatigue failure is limited. The most sophisticated models employ a form of multiaxial fatigue criteria of which there are many with no consensus on their accuracy with respect to gear tooth bending fatigue. These models still rely on empirical interpretations of experimental material fatigue data. Due to the uncertainty of these models, gear designers still many times rely on experimental gear tooth bending fatigue data.

The preferred experimental data for gear designers comes from direct testing on gear specimens either in the form of single tooth bending fatigue methodologies (STBF) or rotating gear (RG) methodologies. STBF is by volume the large majority of gear tooth bending fatigue testing performed. Many adaptations can be found in literature either using specialty anvils to contact a fixed gear or using a mating gear to apply load to a fixed gear in a specific position [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]. By comparison, very few examples of RG testing exist [2, 6, 16, 23, 27,28,29,30,31,32]. Many early studies had mixed results with multiple modes of failure and extensive damage to the gearbox upon tooth failure. However, recent studies by Winkler et al. [2], Hasl et al. [32] and Hong et al. [29, 33] highlight the feasibility of RG testing as a reliable methodology for gear tooth bending fatigue testing.

There is a need to improve upon the understanding of how STBF test results compare to the expected life of a production gear designed based on available STBF data and the results of RG testing in order to unify data sets. Four main differences between rotating and single tooth bending must be reconciled in order to understand the comparison in fatigue life between RG and STB gear tooth bending fatigue testing.

  1. 1.

    The first difference in fatigue life due to harmonic loading vs gear mesh loading condition along with different test frequencies. This was discussed in ref [33]. but no empirical or physical relationship was offered. In STBF testing, a sinusoidal cyclic load is applied to a single gear tooth producing a proportional sinusoidal stress in the loaded gear tooth root. In a loaded gear mesh any given tooth is loaded only during the short period it meshes with a tooth of the mating gear and is unloaded during the rest of the rotation. The load distribution that the tooth experiences is then a function of the tooth geometries, hub, web and rim compliances, and operating conditions. For a single parallel axis spur gear pair with a relatively stiff hub, web and rim, this means that gear tooth root stress is zero for \(100(z-\upvarepsilon )/z\) percent of the loading cycle where the loading cycle length is equivalent to one rotation of the gear, z is the number of teeth on the gear and ε is the operating profile contact ratio. The effect of the zero stress relaxation period as well as the difference in loading form on fatigue life is unknown. Limited studies [34, 35] on the effect of frequency and load shape has been done on notched specimens but within the context of corrosion prone environments, making their applicability to gear fatigue unknown.

  2. 2.

    The second difference is the statistical difference in fatigue life due to definition of the specimen. This difference was first mentioned by Seabrook and Dudley [23] in 1964. In RG testing, the specimen is the entire gear. Failure of the specimen constitutes one tooth failing and suspension of the remaining \(z-1\) teeth. In STBF testing, the specimen is a single tooth on a gear. Multiple teeth can be tested on a single gear and each tooth tested either fails or suspends. A mathematical relationship to describe this statistical difference was suggested by McPherson and Rao [6] but no conclusive evidence to its validity was found. This also raises the question of manufacturing variation effect differences between STBF and RG testing. It is plausible that for any single gear, machining and processing variation magnitudes may be smaller than comparing between different gears. It is also possible to perform an entire matrix of STBF testing from a single gear while RG testing must always involve multiple gear specimens to create statistical conclusions. Larger standard deviations may therefore be expected from RG testing. The statistical difference between computing median fatigue life from individual tooth failures and suspensions to the life computed from entire gear failures must be accounted for.

  3. 3.

    The third difference is the achievable stress ratio or ratio of the minimum to maximum load. This difference was also discussed by McPherson and Rao [6]. Fully reversed stresses have a stress ratio \(R=-1\) as the compressive stress achieved in each cycle is equal and opposite the tensile stress. Fully released stress (\(R=0\)) achieves a minimum stress of exactly zero in each stress cycle. STBF testing is generally limited to \(R> 0\). \(R=0\) ratios are not used in STBF testing to avoid impact at the tooth contact interface. In addition, creating \(R< 0\) ratios in a STBF test requires a non-parallel loading axis and additional support adding extreme complexity to a space constrained test setup. RG testing on the other hand is generally limited to \(R\leq 0\) ratios as all the gear teeth experience an unloaded state out of mesh constraining the minimum achieved root stress to zero or less than zero in idler and planet configurations. This means that there is no single stress ratio that both STBF and RG testing can achieve. However, the fact that stress ratio has an effect on fatigue life is very well documented dating back to the 1873 works of Müller [36] through arguably more well-known from contributions by Goodman [37]. It is worth noting that in some high-speed applications, the minimum gear tooth root stress ratio is greater than zero representing a baseline non-zero tensile root stress when the gear teeth are not in mesh. This is generated from centrifugal forces expanding the rim of the gear creating tensile hoop stresses and the effect of the gear teeth pulling outward from the rim. However, no RG test setup currently exists to perform controlled \(R> 0\) bending fatigue tests.

  4. 4.

    Final major difference is the potential for different crack initiation locations. A crack initiation in gear tooth bending fatigue can be categorized as either surface or subsurface. Surface cracks are thought to be initiated from local stress concentrations created from small radii in the machine tooling such as cutters and grinders [38]. Subsurface initiations originate from non-metallic inclusions such as sulfides, oxides and silicates or from voids in the material microstructure [39]. These subsurface locations are still very close to the surface (< 0.1 mm deep) and have very distinct optical characteristics usually described as a “fisheye” [40]. It has been empirically found that the fatigue lives associated with these two initiation locations are different with sub-surface initiations being described as having longer fatigue lives for any given finite life producing stress level. This empirical finding has given rise to the step-wise SN curve [40]. Therefore, fractographic images should be taken from failure surfaces to ensure correct categorization and separation of data sets. Furthermore, initial evidence from RG and STBF testing by Winkler et al. [2] suggests that STBF and RG testing tend to produce failures of opposite initiation type with the STBF testing producing more surface initiated cracks and RG testing producing more sub-surface initiated cracks though this behavior may be somewhat material/processing related as well. Both initiation types have been observed in both STBF and RG testing.

The scope of developing robust empirical relationships reconciling all of these differences is difficult. This is further complicated by the uncertainty that the relationship will be equivalent for different materials and material processing methods such as surface hardening methods, peening and surface roughness processing. The statistical difference has been partially explored by Seabrook and Dudley [23] and more recently McPherson and Rao [6]. Seabrook and Dudley found an approximate 30% increase in the fatigue strength estimated from STBF tests as compared to a limited number of RG tests. They attributed the difference to dynamic loading in the RG test but made note that failure in the RG tests is represented by the weakest tooth on the gear. McPherson and Rao [6] calculated an adjusted normal probability variant to account for the probability of one tooth failing on a gear of z teeth. This adjusted normal probability variant was then used to adjust the fatigue strength calculated from binary pass/fail STBF data at various loads to an equivalent RG fatigue strength. Calculated fatigue strengths were compared to a limited set of RG data and found to be significantly less than the experimental data suggests with select data points showing closer agreement. This methodology also included a formulation for accounting for stress ratio differences in the STB and RG data by using known empirical relationships for constant life similar to Goodman/Haigh to compute equivalent stresses at a different stress ratio for equivalent fatigue life or at the fatigue strength. McPherson and Rao [6] also noted that the experimental STBF and RG data generated as part of the study showed approximately a 30% increase in estimated fatigue strength, consistent with finds by Seabrook and Dudley [23].

A single empirical fraction of \(\sigma _{\mathrm{fat}-RG}=0.9\sigma _{\mathrm{fat}-\mathrm{STB}}\) translating the estimated fatigue strength from STBF testing to RG testing and assumed to be accounting for all four of the above mentioned test differences is also given in several manuscripts [2, 14, 16, 19]. The empirical origins of this transformation constant appear to be from a statistical report of Forschungsvereinigung Antriebstechnik (FVA) Project 304 in 1999 [16, 41] or earlier origins in a 1987 German manuscript by Rettig [42] or both. Both manuscripts are only available in German making their review in the English context difficult.

As described, the existing technology pertaining to the translation of STBF to RG bending fatigue testing is limited. Even with availability of the previously mentioned German works, it is still limited to comparing the assumed fatigue strength. Contemporary work suggests that very high cycle fatigue (VHCF, \(N_{f}> 10^{7}\)) lives of sub-surface initiated cracks in certain materials or materials of certain microstructure formations may not exhibit conventional endurance limit/fatigue strength behavior [43]. As longer service life of machines is desirable, the expected service life of a gear will need to keep pace with other components. Therefore, it becomes increasingly necessary to develop robust empirical relations between the vast existing body of STBF data to limited high cycle fatigue (HCF, \(10^{3}> N_{f}> 10^{7}\)) and VHCF RG data. Accordingly, The goals of this research are to (i) demonstrate a unified methodology of STBF and RG testing on a single test specimen, (ii) investigate statistical techniques that can be used consistently for both STBF and RG testing in order to regress Stress-Life (SN) relationships, (iii) define empirical relationships using single set data (STBF or RG only) which allow for the estimation of the total STBF to RG difference based on the four individual differences described above, and (iv) add empirical data to the existing limited STBF vs RG test datasets.

2 Experimental setup

2.1 Rotating gear test

An existing RG test methodology by Hong et al. [29, 30] was adopted in order to perform HCF and VHCF gear tooth bending fatigue testing at a stress ratio of \(R=0\). It consisted of a 91.5 mm center distance power circulating (back-to-back) test machine featuring three axes of rotation. Modular connections among the three axes and between the two gearboxes allowed for the application of various gear tooth loading types including idler (fully reversed stress) loading and torque split (sun gear type) loading. In its simplest form the machine was reduced to a two-axis arrangement with a single gear pair in each gearbox producing fully released (R=0) gear tooth root stresses. A split coupling was used to apply a constant torque to the gears and an external AC motor drives the power circulating loop. Fig. 1 shows an image of one of several identical RG fatigue test machines used in this investigation.

Fig. 1
figure 1

A three axis rotating gear fatigue test machine

Lubrication and heat removal were accomplished via a liquid-cooled oil bath. Vibration based algorithms monitored test activity and shutdown the test at the onset of tooth fracture, capturing the failure with a macro surface crack sustained from the application of plastic strains at the crack tip but prior to complete separation of the gear tooth. Rotation speed of the test pinion was sustained at about 4400 rpm for all load levels used in testing. Dynamic strain measurements performed on the gearbox at test load and speed conditions yielded dynamic root stress factors Kd of about 0.94–0.99 depending on load. These dynamic factors were accounted for in the reported stresses for the RG testing.

The gear pair used in the RG test setup consisted of a 17T pinion and a 25T gear where the 17 T pinion was engineered to fail exclusively from tooth root breakage. Contact stresses were kept modest and an ISO 150 lubricant was used to minimize the effects of wear. Temperatures were sustained at 90 oC throughout the duration of the test. When the machine was utilized with three axis, a second 25T gear was used with the 17T acting as the central idler member.

2.2 Single tooth bending fatigue test

A STBF fixture was designed and fabricated for the testing of the RG test specimen. The setup was made to be compatible with a series of universal hydraulic oscillators located at The Ohio State University and shown in Fig. 2a. These hydraulic oscillators were capable of applying a 130 kN maximum force with oscillation frequencies peaking around 100 Hz.

Fig. 2
figure 2

a A STB fatigue test load frames, and b STB test fixture

In a STB test, the desired load path is for force to travel through the upper anvil through the test tooth and be reacted only by the reaction tooth supported by the lower anvil. In order to achieve this the gear must be in static equilibrium from the test tooth and reaction tooth forces. As these forces result in line contact in a spur gear, the contact lines must be parallel, and the test and reaction contact surfaces must also be parallel. The contact locations such that these criteria were satisfied were determined. While multiple sets of contact points exist which satisfy the prior described conditions, the contact point was designed to be approximately the highest point of single tooth contact (HPSTC) in the RG test setup. These STBF fixtures utilize a fixed lower anvil which support the reaction tooth. The test gear is then located via a centering shaft with soft oil impregnated bronze acting as a bushing between the test gear inner diameter and the locating shaft. An upper anvil is aligned via the same centering shaft such that the upper anvil contacts another gear tooth (test tooth) at a contact point in the addendum. A ball bearing is used between the ram and the fixture to negate effects of small amounts of misalignment between the axis of the ram and the desired axis of the load path. An image of an assembled test fixture is shown in Fig. 2b.

Tooth root strain measurements were taken in order to validate the fixture design and stress prediction models. A finite element based tooth root stress prediction solver as part of a gear tooth load distribution model [44] was used to predict the state of stress in the test tooth root based on the loading position defined in above. Strain gauges were mounted in the root of a 17T gear specimen at the predicted location of maximum tooth root stress. Two sets of loaded strain measurements were recorded. The first was a dynamic cycling of the test tooth in order to measure strains under conditions similar to the fatigue test. Lower loads were used in order to avoid exceeding the fatigue strength of the strain gauge but the fatigue test operating frequency of 40 Hz and a stress ratio of \(R=0.05\) was used. The second was a quasi-static measurement where load applied to the gear test tooth was increased at a steady-state condition. First, the stress maximums achieved in each cycle under dynamic conditions were compared to the equivalent bending stress produced under quasi static conditions at equivalent load set-points F in order to evaluate if any dynamic effects were present. A dynamic factor was calculated as

$$K_{d}=\frac{\frac{1}{N}\sum _{1}^{N}\max \left(\overline{\sigma }_{g}^{f=40}\right)_{k}}{\overline{\sigma }_{g}^{f=0}\left(F\right)}.$$
(1)

Where \(\overline{\sigma }_{g}^{f=0}\left(F\right)\) is the normalized measured gage stress under quasi-static conditions at loading frequency \(f=0\), and \(\max \left(\overline{\sigma }_{g}^{f=40}\right)_{k}\) is the maximum normalized measured gage stress corresponding to the kth cycle at a loading frequency of 40 Hz. At all loading conditions tested, \(0.99\leq K_{d}\leq 1.01\) indicating very minimal dynamic effects.

The measured gage bending stresses were also compared to the predicted tooth root stresses. As the strain gage has non-zero area, it cannot measure at a singular point as in FEA analysis. Therefore, the minimum and maximum FEA stress within the footprint area of the strain gage was compared to the measurement shown in Fig. 3. The measured gage stresses are shown to be slightly lower than the FEA but are still very reasonable. Possibilities for the error are in gage placement where very small placement errors may result in large changes in measured stress. Overall, the measurement shows that STB fixture is performing as designed.

Fig. 3
figure 3

Comparison of measured and FEA predicted STBF tooth root stress

3 Fatigue tests and results

A series of tests were performed utilizing the RG test setups in the two-axis (\(R=0\)) arrangement as well as on the STB test setups. A batch of equivalent case-carburized test gears made of high-end alloy gear steel were procured for this testing. Tables 1 and 2 show the results of RG and testing and STB tests, respectively. A SN plot showing the both sets of data married together is shown in Fig. 4. The maximum bending stress values for both test results have been normalized for confidentiality reasons and are denoted as \(\overline{\sigma }_{\max }\).

Table 1 RG Test Results
Table 2 STB Test Results
Fig. 4
figure 4

RG and STB fatigue test results along with the regressed median fatigue lives for each population

It is important to note what was and was not included in the SN curve of Fig. 4 with respect to the four differences in testing described in the introduction. Specifically, no corrections were made (i) to account for differences in the loading form, (ii) to account for the difference in test specimen identification, (iii) to explain the difference in the stress ratio where the RG testing is performed at \(R=0\) and the STB testing is performed at \(R=0.05\), and (iv) to account for potential differences in crack initiation location differentiating between surface and subsurface failure modes.

The total difference between the STB and RG testing will first be quantified without any corrections. Following that quantification, methods to derive individual differences from a single set of data only will be demonstrated and the aggregate total of those errors will be shown to closely estimate the total difference.

3.1 Data analysis

The SN plot of Fig. 4 visualizes the difference between RG and STB testing accounting for all four of the differences described. This total difference percentile KSR can be described as

$$K_{RS}\left(N_{f}\right)=\frac{\overline{\sigma }_{RG}^{\left(p\right)}\left(N_{f}\right)}{\overline{\sigma }_{\mathrm{STB}}^{\left(p\right)}\left(N_{f}\right)}$$
(2)

where \(\overline{\sigma }_{RG}^{\left(p\right)}\left(N_{f}\right)\) is the SN relationship between the stress parameter \(\overline{\sigma }\) and the failure life Nf regressed from the rotating gear testing for a failure percentile of p. Correspondingly \(\overline{\sigma }_{\mathrm{STB}}^{\left(p\right)}\left(N_{f}\right)\) is the equivalent for STB testing. As the slopes between \(\overline{\sigma }_{RG}^{\left(p\right)}\left(N_{f}\right)\) and \(\overline{\sigma }_{\mathrm{STB}}^{\left(p\right)}\left(N_{f}\right)\) may not be equal, KRS is therefore a function of Nfas well. KRS comprises of the four differences between RG and STB testing. It can therefore be described as the product of the difference in allowable stress for equivalent fatigue lives from each individual source.

$$K_{RS}=K_{F}K_{S}K_{R}K_{I}$$
(3)

where KF is the difference in allowable stress for equivalent fatigue lives between RG and STB testing due to differences in the applied force time history alone. KS accounts for the shift in SN relationships due to statistical differences derived from specimen definition, KRis the difference in SN relationships due to differences in test stress ratio R, and KI accounts for the SN differences associated with varying crack initiation locations. Each of these factors relates to physical difference in the fatigue testing but mathematical or empirically validated relationships for these physical parameters do not exist.

Knowledge of the total scaling factor KRS for one set of testing does not guarantee congruency with testing from other gear designs, materials and test methodologies. The statistical difference parameter KS will change if the tooth count of the RG specimen changes as more teeth are simultaneously cyclically loaded with higher tooth counts. Different test methodologies may employ different stress ratios in the STB testing requiring a change to only the stress ratio parameter KR. In addition, different materials may have different constant life (Goodman) relationships which varies KR even for the same stress ratios. KI, in addition, may be influenced by material changes where the ratio of subsurface and surface initiated failures from RG and STB testing may be material influenced. Failure to account for any of these parameters may lead to poor correlation between RG and STB data sets.

It would be extremely beneficial to the knowledge base to determine empirical or physical relationships for each of the four parameters, KF, KS, KR, and KI. However, large databases of RG and STB data would be needed for validation. The current study will use the limited data shown in Tables 1 and 2 and Fig. 4 to demonstrate techniques to derive the total difference and suggest methods of determining the statistical parameter KS and the stress ratio parameter KR.

3.2 Stress-life regression

The Stress-Life relationships \(\overline{\sigma }_{RG}\left(N_{f}\right)\) and \(\overline{\sigma }_{\mathrm{STB}}\left(N_{f}\right)\)are determined using Maximum Likelihood Estimates (MLE). The statistical likelihood that the experimental results shown in Fig. 4 occur is computed by assuming a general form of \(\overline{\sigma }_{\max }\left(N_{f}\right)\) and also assuming a distribution for which Nf varies at any given stress level \(\overline{\sigma }_{\max }\). The solution for the parameters describing the distribution (i.e. mean and standard deviation for a Gaussian distribution) and the assumed form of the \(\overline{\sigma }_{\max }\left(N_{f}\right)\) relationship (i.e. m and b for a linear relationship of form \(\sigma =mN_{f}+b\)) for which the likelihood is maximized. This set of parameters which maximize the likelihood are then said to be the optimal solution for the assumed SN relationship and distribution. In practical implementation the median or B50 percentile survival value \(B_{50}=\overline{\sigma }_{\max }^{50}\left(N_{f}\right)\) is set to be equal to the functional SN relationship. The standard deviation or other distribution descriptive parameters may be constants or also functional relationships of stress or life. In addition, the regression assumes life is the random variable and must be performed as \(N_{f}\left(\overline{\sigma }_{\max }\right)\).

The assumed form of the SN relationship and statistical distribution are at the discretion of the analyst. As almost any form can be used, it is important to note that correlation \(\neq\) causation and the seemingly best fitting relationships may not represent the physics of the fatigue process. A regression suggested by Pascual and Meeker [45] utilizing an assumed function with a regressed horizontal asymptote representing a fatigue strength will be used here to compute SN relationships for both the STB and RG data. The median Life-Stress relationship is defined as

$$N_{f}^{50}\left(\overline{\sigma }_{\max }\right)=\alpha _{1}+\alpha _{2}\ln \left(\overline{\sigma }_{\max }-\gamma \right),\quad \min \left[\left(\overline{\sigma }_{\max }\right)_{if}\right]> \gamma$$
(4)

where α1 and a2 are regressed constants related to the y‑intercept and slope and γ is a regressed fatigue strength which must be less than lowest cyclic stress level producing failure \(\left[\left(\overline{\sigma }_{\max }\right)_{if}\right]\) and if is an index of fatigue test stress levels producing failure. A log-normal distribution is assumed to describe the fatigue life distribution at any given stress level yielding the general form of the regression for a failure rate of q percent to be

$$N_{f}^{q}\left(\overline{\sigma }_{\max }\right)=N_{f}^{50}\left(\overline{\sigma }_{\max }\right)+ze^{\left[\beta _{1}\right]},\quad \min \left[\left(\overline{\sigma }_{\max }\right)_{if}\right]> \gamma$$
(5)

where β1 are regressed constants related to variance. z is the standard normal variate corresponding to the qth percentile. Both the assumption of a fatigue limit in STB and RG gear tooth bending fatigue data and that the life distribution follows a lognormal distribution are commonly used in literature [46]. Table 3 provides the regressed parameters α1, α2, β1 and γ for both the RG and STB testing. Also shown in Fig. 4 are both SN regressions plotted for a range corresponding to 90% of the minimum stress run in the corresponding test type up to 110% of the maximum stress run. A close agreement between the two regressions is observed, indicating that the total difference percentile KRS is close to unity within the range plotted. KRSis computed according to Eq. (2) using the B50 life such that

$$K_{RS}\left(N_{f}\right)=\frac{\overline{\sigma }_{RG}^{50}\left(N_{f}\right)}{\overline{\sigma }_{\mathrm{STB}}^{50}\left(N_{f}\right)}$$
(6)

and is plotted in Fig. 5. Two points are of particular interest from Fig. 5. First, the ratio of the regressed fatigue strengths is shown as the asymptotic behavior of the regression produces a constant as \(N_{f}\rightarrow \infty\) and as expected the RG fatigue strength is lower than the STB fatigue strength. This behavior will be investigated more in the context of removing the effect of the four individual physical differences individually. Secondly the ratio increases as Nf nears 1M and even indicates that the RG life is longer than corresponding STB lives. Below about 1M cycle life range the ratio then decreases again indicating that cyclic stresses will be lower for RG testing when producing fatigue lives closer to representing low cycle fatigue behavior. However, it is anticipated that this ratio will not be constant for all RG to STB test comparisons. Rather, mathematical or empirical relations must be developed to broaden the utility of this RG to STB comparison formulation.

Table 3 SN Regression Parameters
Fig. 5
figure 5

Total ratio of stresses at equivalent fatigue lives between RG and STB testing

3.3 Statistical factor K S

As it is more useful to convert STB data to an RG equivalent, a possible formulation for doing so will be provided. The basis for this is the regression of the log-normal variance completed as part of the full regression of Sect. 3.2. As the test gear specimen has 17 teeth, the STB data needs to be converted to an equivalent 1:17 failure rate. This corresponds to a standard normal variant of \(z=-1.5647\). Eq. (5) is then used to compute \(N_{f}^{1/17}\left(\overline{\sigma }_{\max }\right)\). A comparison of \(N_{f}^{50}\left(\overline{\sigma }_{\max }\right)\) and \(N_{f}^{1/17}\left(\overline{\sigma }_{\max }\right)\) for the STB data is shown in Fig. 6. I shows that at the \(q=1/17\) percentile is shifted to the left and has a constant variance as indicated in Eq. (5). It also lacks any significant difference in the regressed fatigue limit. Let qRG be the failure percentile corresponding to the number of teeth failed versus the number of teeth zp on a RG test specimen. Assuming one tooth fails in an RG test,

$$q_{RG}=\frac{1}{z_{p}}$$
(7)

and the statistical coefficient KS can be defined as

$$K_{S}=\frac{\overline{\sigma }_{\mathrm{STB}}^{\left(q_{RG}\right)}\left(N_{f}\right)}{\overline{\sigma }_{\mathrm{STB}}^{\left(50\right)}\left(N_{f}\right)}.$$
(8)
Fig. 6
figure 6

STB B50 fatigue lives compared to STB B(1/17) fatigue lives

A plot of Ks is shown in Fig. 7. Ks is between 0.53 and 0.99 at reasonable life ranges corresponding to the tested life. The RG \(N_{f}^{50}\left(\overline{\sigma }_{\max }\right)\)is then compared to the STB \(N_{f}^{q_{RG}}\left(\overline{\sigma }_{\max }\right)\) in Fig. 8. This plot is the equivalent of

$$K_{S}\overline{\sigma }_{\mathrm{STB}}\left(N_{f}\right)=\overline{\sigma }_{RG}\left(N_{f}\right)$$
(9)

where KF, KR, and KI are missing. It is noted that although Fig. 8 might indicate further separation of the RG and STB regressions, this phenomena is commonly observed in very high cycle fatigue corresponding to separation of SN curves between surface failures and subsurface failures. These separate curves are also referred to as Step-wise SN curves [40]. Therefore, it is reasonable to suggest that including a factor accounting for the difference in the SN curves from surface failures and subsurface failures would then bring the difference in the final observed test data back to almost one. This further suggests that the effects of KF and KR may be minimal.

Fig. 7
figure 7

Ratio of bending stresses between RG and STBF testing due to statistical differences

Fig. 8
figure 8

RG B50 fatigue lives compared to STB B(1/17) fatigue lives

3.4 Stress ratio factor K R

Equivalent stresses at different stress ratios for constant life (CL) is a long studied phenomenon as discussed in the introduction by researchers such as Müller, Goodman and many others. These constant life relationships are dependent on material and should therefore be evaluated individually.

An ultimate tensile strength test was completed on the gear test specimen using the STB test machine. Load was slowly applied on an unused tooth until failure. The peak load sustained was recorded and associated tooth root bending stresses were computed. It was found that the ultimate tensile strength of the gear tooth in bending was \(\overline{\sigma }_{\mathrm{ult}}=1967\). A repeat test resulted in less than 0.7% difference in \(\overline{\sigma }_{\mathrm{ult}}\).

The modified Goodman approach was then be used to determine the effect where for tensile mean stresses

$$\sigma _{a}=\overline{-}\frac{\sigma _{R=-1}}{\sigma _{\mathrm{ult}}}\sigma _{m}+\sigma _{R=-1}$$
(10)

where σa and σm are alternating and mean stresses respectively and \(\sigma _{R=-1}\) is the fully reversed cyclic stress amplitude corresponding to equivalent fatigue life. Eq. (10) defines a line of constant life for all stress ratios \(-1\leq R< 1\) producing \(\sigma _{m}\geq 0\). In this work the mean and amplitude cyclic stress producing any given fatigue life is known at a stress ratio of \(R=0.05\) along with \(\overline{\sigma }_{\mathrm{ult}}\). The RG testing produced a stress ratio of \(R=0\). Therefore \(\sigma _{R=-1}\) can be solved for any Nf using the STB data and using \(\overline{\sigma }_{\mathrm{ult}}\). An equivalent σa, σm and \(\sigma _{{_{\max }}}\) can then be computed corresponding to a stress ratio of \(R=0\) by converting the STB data to an equivalent stress ratio as the RG data. The constant life map showing the solution for Eq. (10) as well as the converted STB data for lives of \(N_{f}\subset [10^{5},10^{6},10^{7},10^{8}]\) is shown in Fig. 9. KR can then be defined as

$$K_{R}=\frac{\overline{\sigma }_{\mathrm{STB}}^{\left(R_{RG}\right)}\left(N_{f}\right)}{\overline{\sigma }_{\mathrm{STB}}^{\left(R_{\mathrm{STB}}\right)}\left(N_{f}\right)}$$
(11)

where \(\overline{\sigma }_{\mathrm{STB}}^{\left(R_{RG}\right)}\left(N_{f}\right)\) is the STB SN relationship converted to a equivalent stress ratio as the RG data RRG. This technique can be used for other stress ratio conversions beyond the \(R=0.05\) to \(R=0\) as demonstrated here. KR is plotted in Fig. 10. It is observed that the maximum stresses at \(R=0\) are within about 96% to 99% of the equivalent maximum stress at the STB test stress ratio of \(R=0.05\). This indicates that the difference in STB and RG testing resulting only from stress ratio testing differences in STB and RG testing is much less than the statistical difference KS.

Fig. 9
figure 9

Constant Life Diagram showing STB R = 0.05 stress and comparable STB R = 0 stresses

Fig. 10
figure 10

Stress ratio factor KR

3.5 Initiation location factor K I

The difficulty in determining the difference in SN relationships due to surface and subsurface initiation differences is shear quantity of data. In order for this to produce a statistically conclusive difference, a single testing program (RG or STB) would need to generate sufficient data containing both surface and subsurface failures to regress both independently. A full statistical analysis between the two sets is beyond the capabilities of the current data set. However, simplified measured to estimate the difference may still be employed.

Fractographic inspections were performed on the RG test failures. Out of the 14 failures, eight were found to have failed from subsurface voids or inclusion. The other six failed had crack initiation locations at the material surface. Fig. 11 shows these failed points plotted alongside the \(K_{S}K_{R}B_{50}^{\mathrm{STB}}\) and \(B_{50}^{RG}\) curves. It is observed that two distinct populations exist. Four of the six surface failures are over a decade removed from the subsurface failures. It is noted though that two of the surface failures occur very near the subsurface failures. All of the subsurface failures are grouped closely together. Furthermore, the difference in fatigue life between the surface and subsurface groups is visually observed to be approximately equal to the life difference between the \(K_{S}K_{R}B_{50}^{\mathrm{STB}}\) and \(B_{50}^{RG}\) curves.

Fig. 11
figure 11

Comparison of RG surface initiated failures and RG subsurface initiated failures

The initiation location factor KI between the surface and subsurface populations can be defined as the ratio of stresses at the geometric mean lives of subsurface and surface populations corresponding to the SN relationship they originated from:

$$K_{I}=\frac{\sigma _{\max }^{RG}\left[\left(\prod _{i=1}^{f_{s}}N_{if}^{s}\right)^{1/{f_{s}}}\right]}{\sigma _{\max }^{RG}\left[\left(\prod _{i=1}^{f_{ss}}N_{if}^{ss}\right)^{1/{f_{ss}}}\right]}$$
(12)

where fss and fs are the numbers of subsurface and surface failures and \(N_{if}^{s}\) and \(N_{if}^{ss}\) are individual fatigue lives of subsurface and surface specimens, respectively. This geometric mean lives are plotted on Fig. 11 at the evaluated stress levels forming the ratio in Eq. (13). Eq. (12) results in \(K_{I}=1.165\) indicating an increase in the STB lives to more closely approximate the RG results. The use of Eq. (12) also assumes that the STB testing resulted in purely surface initiated failures. A preliminary fractographic analysis of the STB tests indicates that 14 of 17 failures initiated from surface locations suggesting that the use of Eq. (12) is appropriate.

It might be appropriate to define a ratio of the number of expected surface and subsurface failures found in a single type of testing as

$$r_{f}=\frac{f_{s}}{f_{s}+f_{ss}}$$
(13)

Here, a value of rf approaching 1 indicates that most or all of the failures were surface initiated and a value approaching 0 indicates that most or all of the failure were subsurface initiated. In the tests considered here, \(r_{f}=\)0.428 for the RG testing and \(r_{f}=\)0.8235 for the STB testing. This represents one set of data for one material though it is very desirable to have a larger set as this ratio may be material dependent. The work by Winkler et al. [2] which presented sets of RG and STB testing for eight material variants can be used as a comparison points. Reported surface and subsurface failures for each variant were tabulated and rf was computed. Fig. 12 shows a comparison of the ratios for each variant used in their testing (W-V1—W-V8) along with the ratios from the testing presented in this research (H-CCS). Six of the nine variants tested with STB methods resulted in a majority or entirely in surface initiated failures. Seven of the eight variants tested by RG methods resulted in a majority or entirely subsurface initiated failures. It is also noteworthy though that two variants tested by STB methods resulted entirely in subsurface initiations. This data review reveals two insights: (i) as a whole, STB testing produces more surface initiated failures and RG produces more subsurface initiated failures, and (ii) this trend can be highly dependent on the material and processing methods. Formulation of the initiation location factor KI should therefore be done for and material type individually.

Fig. 12
figure 12

Comparison of number of surface and subsurface failures for nine different test programs containing STBF and RG data. Data sets W‑V1 to W‑V8 are from Winkler et al. [2]

3.6 Loading factor K F and remaining differences

The last stress factor discussed is the loading factor KF, the difference in RG and STB testing due to load waveform variations and test frequencies. Very little is known on this subject and less with respect to gear loading. Experimental studies could be done using STB methodologies but a test machine capable of at least outputting sawtooth or square wave load forms would be needed. Conversely, crack initiation models utilizing simulated root stress time histories from sinusoidal STB testing and RG testing might provide insight into this difference. The methodology described in here may also allow for the extraction of this difference if sufficient confidence is produced for the other three factors and KF is the only remaining major factor such that

$$K_{F}=\frac{K_{RS}}{K_{S}K_{R}K_{I}}$$
(14)

In this study, \(K_{F}=1\) as no other data exist to suggest otherwise at this point. The four empirical factors translating STB to RG data or vice versa are then defined using at most one set of gear tooth bending fatigue data per factor. The empirical relationship KRS defining the total STB to RG relationship has also been defined. Fig. 13 shows a comparison of KRS defined from both sets of data to KFKSKRKI derived from single source data sets. A reasonably close agreement is seen, especially in the finite life range below 10M cycles. From there the difference in regressed fatigue strength is shown as KFKSKRKI fails to predict a lower endurance limit in RG testing. This discrepancy might be due to several reasons including: (i) the regression used does not consider a distribution of fatigue strength between specimens and assumes that one endurance strength exists for all specimens such that the treatment for KS is nulled, and (ii) the treatment for KI does not consider any difference in fatigue strength between surface and subsurface failures.

Fig. 13
figure 13

Comparison of total stress ratio difference between STB and RG testing derived from knowledge of both curves KRS and individually derived ratios from single set data

The difference between regressed fatigue strengths is still an unknown, which the current data set cannot define. VHCF data on hourglass specimens suggests that such a fatigue strength occurs at fatigue lives of (10)9 or more, which is beyond the suspension criteria currently used in any typical gear tooth bending fatigue experiment. The regression algorithms used here depended on having suspension data to these levels to determine accurate fatigue strengths. Therefore, it should be seen that the current difference between KRS and KFKSKRKI at the fatigue strength in this treatment is scrutinized and investigated further. Fig. 14 shows the difference between \(K_{S}K_{R}K_{F}K_{I}B_{50}^{\mathrm{STB}}\) and the equivalent \(K_{RS}B_{50}^{\mathrm{STB}}=B_{50}^{RG}\) curves. The finite life region shows close agreement as with Fig. 13 yet the fatigue strengths diverge as \(N_{f}\rightarrow \infty\).

Fig. 14
figure 14

Comparison of transformed STBF data using the total ratio KRS and the individually derived ratios

The error in the estimates used to correlate regressed fatigue strengths stems from the inability of the regression used to develop a variance of the regressed fatigue limit parameter γ with respect to the stress level as well as the possible physical difference in fatigue strengths due to change in surface and subsurface initiation. This could be handled by the inclusion of an addition empirical parameter relating the two fatigue strengths but it is the authors opinion that it would be more prudent to focus on a statistical techniques to first determine what percentage of the difference can be explained statistically. Two techniques exist to regress the fatigue strength as a random variable. The first is sets of up-down data as discussed by Little [47] and the second is to consider the fatigue strength as a random variable within the MLE process as formulated by Pascual and Meeker [48]. Both techniques require data beyond what the test set disclosed in this research contains.

4 Summary and conclusions

Sets of STB and RG tests were presented and analyzed in order to investigate differences in the resulting stress-life relationships. Total difference was accessed via comparison of the regressed median lives. It was proposed that the total difference is the resultant of four individual differences, namely, the loading factor, stress ratio factor, statistical factor and initiation location factor, and that three of these differences can be accessed directly from a single set of data. Suggested methodologies to extract the ratio of maximum bending stress for equivalent test fatigue lives between STB and RG testing for the stress ratio factor, statistical factor and initiation location factor were described. The difference due to the loading waveform test differences was assumed to be unity as very little information exists on this topic and the resultant three individual factors derived from single set data are compared to the total difference. Moderate agreement was found though the regressed fatigue strengths still differ considerably. This suggests that the proposed treatment either lacks the description of the fatigue strength as another random variable or another empirical conversions factor is needed. As RG testing becomes more accessible, more directly comparable STB and RG test data is needed on additional gear materials to advance current knowledge on this issue to make STB tests more reliable as a design tool.