Introduction

The power–duration relationship is a well-established framework for modelling human performance and in its hyperbolic form is known also as the critical power model (Morton and Hodgson 1996). Although different parameterizations and re-arrangements of the equation are possible, duration, i.e. the time to exhaustion (Tlim), must be acknowledged as the dependent variable (Morton and Hodgson 1996; Jones et al. 2010). This has sound (bio)logical foundations: the external power output (P) is the major determinant of the time course of exhaustion-related physiological variables (Poole et al. 1988; Black et al. 2017; Vinetti et al. 2017) and not vice versa. This choice is also justified by statistical theory: due to the intra-individual biological variability of Tlim, the absolute random error in P (εP) could be judged negligible as compared to the absolute error in Tlim (εTlim) (Morton and Hodgson 1996). Thus, the regression procedure can focus on minimizing the distances along the Tlim-axis only. Moreover, the relative error of Tlim (i.e., the ratio εTlim/Tlim) is known to increase with Tlim itself (Poole et al. 1988; Hinckson and Hopkins 2005; Faude et al. 2017), thus representing a source of heteroscedasticity. Therefore, the weighted least squares (WLS) regression has been proposed as the most appropriate method to fit the critical power model to experimental data (Morton and Hodgson 1996; Morton 1996).

However, there are situations in which sources of random error must be acknowledged in both Tlim and P. In the exemplary case of cycling, power meter technology is characterised by a fairly low relative random error (i.e., a low εP/P, good precision) up to 360 W (Maier et al. 2017), but this implies that εP increases with P. Moreover, it is not excluded that also relative error increases with higher P or due to environmental factors such as vibrations, external shocks, changing ambient temperature (Maier et al. 2017). For those reasons, the more intense is the cycling burst, the higher is the uncertainty about P. However, if such an extreme P is carried until exhaustion, the error in Tlim (εTlim) is very low (since it is proportional to Tlim itself, which is here very low too), thus the assumption that εTlim is disproportionally greater than εP is no longer valid and alternative statistical approaches must be adopted.

While ordinary and weighted least squares methods (named Model I regressions) minimize the distance in the dependent variable only, with an implicit assumption that the dependent variable is error-free, Model II regression analysis deals with the assumption that also the independent variable is subject to a certain amount of experimental error, thus minimizing the distance in both axes with several approaches (Ludbrook 2012). Among them, the geometric mean (GM, also known as reduced major axis regression or ordinary least product) regression method, minimizes the sum of the areas determined by the curve and the horizontal and the vertical lines connecting each experimental point to the curve (Brace 1977; Ludbrook 2012) and it can be generalized to nonlinear functions (Ebert and Russell 1994). With respect to other Model II regression methods, GM does not need an arbitrary a priori estimation of the measurements’ errors. In fact, GM assumes that the ratio between the magnitude of the absolute error in the independent and the dependent variable is approximately equal to the absolute local slope of the function (Brace 1977). Luckily, the hyperbolic nature of the critical power model is in line with this assumption: when the slope dTlim/dP is high (Fig. 1, point A), the ratio εTlim/εP is high, and vice versa (Fig. 1, point B). In other words in the GM method, when increasing Tlim and decreasing P, progressively less weight is given to the minimization of the Tlim-axis distance (similarly to the WLS proposed by Morton 1996) with the addition that progressively more weight is given to the minimization of the P-axis distance.

Fig. 1
figure 1

Geometric mean regression minimizes the sum of the highlighted grey areas. The method’s assumption is that the ratio of the error of Tlim and P (εTlim/εP) approximately equal to the local slope (dTlim/dP) of the hyperbola. This assumption is met. Point A: low P (then low εP), high Tlim (then high εP) and high dTlim/dP, but also high εTlim/εP; point B high P (then high εP), low Tlim (then low εTlim) and low dTlim/dP, but also low εTlim/εP

With the present report, we sought to illustrate the GM regression method for the two- and three-parameter critical power models and testing its reliability against the WLS method on our previously published PTlim data (Vinetti et al. 2019).

Methods

Data from Vinetti et al. (2019) were retrospectively fitted by the hyperbolic critical power model by means of nonlinear regression analysis with both the WLS and the GM method. The general form of the model is:

$${T}_{\mathrm{l}\mathrm{i}\mathrm{m}}=\frac{{W}^{{^{\prime}}}}{P-\mathrm{C}\mathrm{P}}+k,$$
(1)

where W′ is the curvature constant, CP (critical power) is the power asymptote, and k is the time asymptote (3-parameter model, 3-p). The theoretical maximal instantaneous power (P0) was calculated as the Tlim-axis intercept of Eq. (1). The two-parameter model (2-p) was obtained by removing the parameter k. The 2-p model was applied to six data points within the severe exercise intensity domain (85–120% of the maximal aerobic power), while the 3-p model included also three points in the extreme domain (150–250% of the maximal aerobic power). The GM method was developed with the same approach of Ebert and Russell (1994) (see Appendix for further details). Briefly, the highlighted areas in Fig. 1 were set as the loss function to be minimized. For the WLS method, the loss function was set as the squared residuals of each ith data point multiplied by the weighting factor 1/Tlim(i)2 (Morton 1996). The standard error of the parameter estimates (SEE) was calculated by bootstrapping. SEE of P0 was calculated from the alternative parameterization of Eq. (1) (see Appendix). Paired-sample t test, linear regression and Bland–Altman analysis were used to compare parameter estimates from GM and WLS. Slope and intercept of linear regressions between parameter estimates were also calculated with the GM method as recommended when comparing methods of measurements (Ludbrook 2012). The level of significance was set at p < 0.05. The statistical package SPSS (Version 23.00, IBM Corp., Armonk, NY) was used.

Results

All parameter estimates were not significantly different between GM and WLS (Table 1). SEEs were also non-significantly different, except for that of CP in the 2-p model, which was lower with GM. Concerning 2-p model, GM yielded CP and W′ identical to WLS, with bias – 0.7 ± 1.6 W and – 0.1 ± 0.5 kJ, respectively, and 95% limits of agreement – 3.8 and 2.4 W, and – 1.1 and 0.9 kJ, respectively (Fig. 2). In 3-p model, CP was identical between GM and WLS, with bias – 0.2 ± 2.0 W and 95% limits of agreement – 4.0 and 3.7 W, while W′, k and P0 present some marginal, nonsignificant differences, with bias – 0.6 ± 0.9 kJ, 1.7 ± 3.0 s and 56 ± 217 W, respectively, and 95% limits of agreement of – 2.4 and 1.3 kJ, – 4.2 and 7.7 s, and – 370 and 483 W, respectively (Fig. 3).

Table 1 Average parameter estimates and standards errors obtained with the geometric mean (GM) and the weighted least square (WLS) regression methods
Fig. 2
figure 2

Comparison of parameter estimates of the 2-p model obtained with GM and WLS regression methods. Left column: regression (continuous) lines with identity (dashed) lines; Right column: Bland–Altman plots including bias (dashed lines) and 95% limits of agreement (dotted lines). CP critical power, W' curvature constant

Fig. 3
figure 3

Comparison of parameter estimates of the 3-p model obtained with GM and WLS regression methods. Left column: regression (continuous) lines with identity (dashed) lines; Right column: Bland–Altman plots including bias (dashed lines) and 95% limits of agreement (dotted lines). CP critical power, W' curvature constant, k time asymptote constant, P0 maximal instantaneous power

Discussion

From a statistical viewpoint, the implemented hyperbolic GM regression method has several advantages over WLS: (1) it progressively accounts also for an error in the P variable when higher P are investigated, (2) it does not require further weighting procedures since it is intrinsically weighted both for P and Tlim and (3) it is independent of whether model’s equation is expressed in terms of Tlim or P. GM belongs to the broader context of errors-in-variable models, mostly confined in econometrics (Schennach 2016)—where large amount of data can be collected and more complex assumptions and analyses are required—and it represents a concise method that is well suited also for the exercise science field.

From an experimental viewpoint, GM was successful in fitting the two- and three-parameter model, leading to results similar to the traditional WLS method. This is particularly evident in the 2-p model, where all parameters were perfectly identical (Fig. 2), conforming the theoretical prediction that when all points in the steep part of the curve and the εTlim/εP is low (point A of Fig. 1), GM tends to mimic WLS. A slightly lower agreement between the two methods is present in the 3-p model for those parameters influencing the extreme exercise intensity domain (point B of Fig. 1), namely k and P0 (Fig. 3). Still, there are no statistically significant differences, probably because of the relatively high reliability of the stationary cycle-ergometer used in the study also for extreme values of P (Vinetti et al. 2019). We expect further divergence in the two methods with less precise ergometers: in this case, parameter estimation with the GM method should be preferred.

In this context, it is noteworthy that the choice of the regression model and method should be an a priori decision based on the identification of the sources of experimental error. Not surprisingly, studies using a systematic a posteriori selection based on the lowest SEE for CP and W′ are necessarily biased towards models that erroneously assume Tlim as the independent, error-free, variable (see for example Black et al. 2017). In fact, since data points mostly lie in the time window where there is both high dTlim/dP and εTlimP (2–15 min, point A of Fig. 1), a high distance in the Tlim-axis corresponds to a small distance in the other axis (either P or the total work, W); therefore, assuming P or W instead of Tlim as the dependent variable is likely to provide lower residual sum of squares (and SEE) and thus the perception of a better statistical fitting. However, this assumption is not consistent with the process that generated the data, but just with the data itself.

In conclusion, when fitting the critical power model to experimental data with a low error in P, the parameters provided by GM do not differ from those provided by WLS. However, for its intrinsic characteristics, GM is conceptually preferable wherever concerns on the precision of P measurement are present. Therefore, it should become the method of choice for statistical treatment of critical power data. Future testing of the GM method with more error-prone data sets, such as from in-field power meters, is welcome.