FormalPara Key Points

ACWR is a rescaling of the explanatory variable, in turn magnifying its effect estimates and decreasing its variance despite conferring no predictive advantage.

Other ratio-related transformations (e.g., reducing the variance of the explanatory variable and unjustified reclassifications) further inflate the effect estimates.

These results also disprove the etiological theory behind this ratio and its components.

The ACWR as metric related to injuries should be disregarded and international recommendations updated and corrected.

1 Introduction

The number of studies examining the relation between training load and injuries in athletic populations has grown in recent years, and at present, there are over 100 studies on the topic [1,2,3,4]. To find an association between training load and injuries, various measures of training exposure have been created. The most popular metric, commonly used as a gold standard reference “model” for several international guidelines, is the acute:chronic workload ratio (ACWR) [1, 5,6,7,8,9]. This ratio is obtained by dividing a ‘fatigue’ component by a ‘fitness’ component. The ‘fatigue’ component is represented by the acute workload (AL), commonly calculated using the workload of the week preceding the injury, while the ‘fitness’ component is represented by the chronic workload (CL), which is the average workload of the four weeks preceding the injury [6, 7]. The AL compared to the CL as measured using this ratio is widely considered to reflect the risk of injury in athletic populations.

ACWR has recently taken sports science and medicine by storm. It has consistently been claimed that the ACWR is associated with injury risk [1, 5, 7, 10,11,12,13], making it a useful metric to reduce the injury risk or prevent injury [7]. This metric has been popularised by several editorials and consensus in high impact factor sport science and medicine journals [5, 7, 10,11,12,13]. Speaking to their influence, these papers are amongst the most highly cited in the field. The rise in the attention received by “load management” in professional practice has also been fuelled by these studies. The influence of ACWR has even bled into the international circuit; it is being used in the development of international guidelines and consensus statements by leading organisations such as the International Olympic Committee (IOC) [12]. ACWR is ubiquitous and is included in national athlete management systems and commercially available software under the assumption that it is related to injury risk and can help (in isolation or in combination with other metrics) to reduce injuries.

Adaptations of ACWR have been proposed using different ways to calculate the AL and CL, such as the exponentially weighted moving average (EWMA) [14, 15], coupled or uncoupled (AL included or not in the CL calculation) [16], and different time windows [17, 18]. Regardless of the method, all have been suggested to work (i.e., are associated with injury); yet, all have conserved a common characteristic: they are all ratios.

Researchers have warned about the use of the ACWR because of a ratio’s failure to normalise the numerator by the denominator and the risk of artefacts (i.e. it adds unnecessary noise) [19]. However, not only did these warnings not gain traction [2], they have been largely ignored, and in doing so, ignore issues that have been highlighted by statisticians for decades [19,20,21].

The aim of this study is to explicate the ratio effects of the ACWR. Using a previously published dataset from professional football players, where originally, a relation between ACWR and injury was reported, we demonstrate the artefacts introduced through the use of a ratio.

2 Methods

2.1 Dataset

Although this demonstration could also be achieved with simulated data, we used a previously published dataset to show the impact on real-world data and results. Data reuse and publication of the results were approved by the authors of the previous publication and the management of the respective team. The details of the data collection can be found in the original manuscript that has been made freely accessible online by the publisher. The manuscript does not comply with the Strengthening the Reporting of Observational Studies in Epidemiology or Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis recommendations for reporting since it is a methodological study [22, 23].

2.2 Participants

Briefly, the players’ individual training load was collected on a professional Italian Serie A team, on 34 players [mean (SD), age: 26 (5) years; height: 182 (5) cm; body mass: 78 (4) kg] over 3 competitive seasons (2013/2014, 2014/2015, and 2015/2016). The dataset was the same used by Fanchini et al. [24], but we deleted the individual player loads with missing data to allow better comparisons between analyses and to avoid any missing data imputation potential influence. 36 weekly loads were excluded out of 1955 (1.8%) and two injuries out of 72. The final dataset included 1919 individual weekly sessions and 70 injuries. Descriptive data are presented in Table 1.

Table 1 Descriptive data of the explanatory variables used in the analyses

2.3 Training load and injury

Internal training load was quantified using the session Rating of Perceived Exertion (RPE) method; that is, by multiplying the training session duration by the corresponding RPE value determined using the Borg’s CR10 scale [25]. Using these training loads, we calculated:

  1. 1.

    AL, training load of 1 week (for injured the week preceding the injury);

  2. 2.

    CL, rolling averages of 4, 3 and 2 weeks (for injured those preceding the injury) including the AL in the calculation of CL (i.e., coupled) as in the original study;

  3. 3.

    CL, rolling averages as above without including the AL in the calculation of CL (i.e., uncoupled) for the calculation of #7:

  4. 4.

    ACWR, ratio between AL and CL;

  5. 5.

    Contrived ACWR, ratios between AL and fixed and randomly generated values of CL;

  6. 6.

    Week-to-week change, difference between ALs the 2 weeks preceding the injury;

  7. 7.

    AL-CL difference, absolute AL-CL difference (coupled and uncoupled);

Data were also categorised using quartiles, and two groups based on the median CL value were also determined. Injuries were classified according to international guidelines [26], and recorded by medical staff. Only non-contact, time loss injuries were used for the analysis.

2.4 ACWR variations

To test the hypothesis that the ACWR is just a rescaling of the numerator and it is not related to a supposed physiological rationale attributed to the CL, we generated contrived ACWR values by using fixed and randomly generated CLs.

We, therefore, created contrived ACWRs using fixed CLs: 500, 1000, 1510 (corresponding to the average CL of the sample), 2000, and 2500. These represent the effect of a simple linear rescaling of the AL, as no variance is contributed by the CLs.

In addition to fixed CLs, we also calculated ACWR values using independently and identically distributed randomly generated data (from a normal distribution). First, we generated samples having (1) the same mean and standard deviation (SD) of the original sample, (2) a lower SD (SD original 282/2 = 141 AU), and (3) a higher SD (SD original 282 + 141 = 423 AU) than the original sample. We generated 25, 20, and 20 ACWR values for each of the 3 aforementioned conditions, respectively. Second, we performed these simulations (100/condition) for a range of mean chronic workloads (500 through 2500, step size = 100; original mean = 1510) and coefficients of variation (CV) (5% through 50%, step size = 5%; original CV ≈ 20%). In doing so, we covered a large sampling space and investigated the effects of different magnitudes and spreads of CLs, which were independent of time, individual, and thus true CL. Estimates from these random models were calculated using trimmed mean, excluding the top and bottom 10% of simulated estimates; otherwise, there were instances of massive outliers which shifted the mean by greater than one order of magnitude.

2.5 Statistical analysis

Our primary analyses were performed in a way that was consistent with that of the previously published study: generalised estimating equations (GEE) with a logistic link function, robust variance estimation, and an exchangeable working correlation matrix. We note that GEEs were not chosen because we considered them the best way to analyse these kinds of observational studies, but rather, to illustrate the potential for artefacts in a way that is congruent with the analytical approaches present in the literature. Because variable temporal autoregression (AR) was not modelled in the original study, we also modelled these data using a GEE with an AR(1) structure. Although not used in the original study, the AR(1) model is preferable since it does not assume all points are equally correlated, but rather, that points are “locally” correlated in time.

We assessed the models and their parameters via the resulting odds ratios (OR), proper scoring rules (Brier), c-statistics (equivalent to the area under the receiving operator characteristic curve), and the estimated probabilities of injury. If the parameter estimate is statistically significant but the model itself does not fit the data well, the overall value of the parameter is unclear. We contend that, ultimately, we are interested in modelling injury risk, and as such, the model should fit the outcome well, statistically significant parameter or otherwise. Given the low sample size and injury prevalence [27], absolute Brier scores were not interpreted; instead, the Brier scores were calculated for comparison purposes (i.e., compared to the intercept-only model’s Brier score). Calibrations were visually inspected using LOESS curves. Finally, although not shown, similar results were obtained with other traditionally used analyses and variations (e.g. GEE using Poisson and changing working correlation matrix, or logistic regression without accounting for repeated measures, etc.).

Mean difference and 95% confidence intervals (95% CI) were also calculated for comparing injured and non-injured players.

3 Results

Descriptive data of the explanatory variables used in this study are presented in Table 1, including the quartiles used for categorising. Depictions of injuries as a function of AL, CL, and ACWRs are presented in Figure S1.

The results of the GEE using the original data but without using the ACWR, are presented in Table 2. Importantly, the results of the original model (ACWR, 4 weeks) indicate ACWR as a predictor confers no predictive advantage to an intercept-only model, even within the training sample (Brier score = 0.035 vs. 0.035 (ratio = 0.998); c-statistic = 0.574 vs. 0.5 in ACWR and intercept-only models, respectively); AL alone was similar to ACWR. Despite this, we investigated and quantified the role of different workloads in other models.

Table 2 Parameters of various models estimated from original data

Some associations are “statistically significant” (p < 0.05) with confidence intervals ranging from 1.000 to 1.001. Their Brier scores and c-statistics were comparable to an intercept-only model. These results are similar to the ones of the original publication [24] and directly follow from the distributions of the raw data, which indicate that injuries are relatively evenly dispersed across AL, CL, and ACWR (Figure S1). The results using the original ACWR, the ACWR created using fixed values of CL and dichotomising the players’ data in high and low CL are presented in Table 3. All of the ORs from the association between injury and ACWR values were in the direction of increased injury risk, with the exclusion of the analysis of the high CL group. Table S1 presents the same results but modelled with an AR(1) error structure.

Table 3 Parameters of various models estimated using the original acute:chronic workload ratio (ACWR from 2 to 4 weeks) and ACWRs created using fixed values for the chronic workload, for whole sample and players’ data dichotomised in two groups based on the chronic load median value

Average point estimates (ORs) obtained by generating random CL for the calculation of the ACWRs are presented in Fig. 1b (simulated over a large parameter space) and Table S2 (simulated over a small parameter space, relative to original sample). The direction of association was generally consistent across random models, but the magnitude was a function of the mean CL and the coefficient of variation of the CL. In all cases, the models had poor predictive performance, much like the original model, despite statistically significant ORs arising from the information contained within the AL. The ORs obtained from GEE using ACWR calculated from random CL with the same SD (282 AU) ranged from 1.16 to 2.07. Using half of the original SD (141 AU), the ORs ranged from 1.41 to 2.70. Increasing the SD to 423 AU, the ORs ranged from 0.89 to 1.31. Details (p values and CIs) of this analysis are presented in Table S2.

Fig. 1
figure 1

Results from models using true and random acute-to-chronic workload ratios. a Nonparametric (LOESS) calibration curve for the model. Even though the acute-to-chronic workload ratio has a statistically significant odds ratio that is greater than 1, the model displays poor calibration, indicating that the acute-to-chronic workload ratio, as it was modelled, does not contain predictive information, even when “tested” in the training dataset. b results for models that use random chronic workloads. Top, as mean chronic workload increases and coefficient of variation decreases, odds ratios increase—this is a basic statistical property of ratios. Bottom, c-statistics or AUCs from random chronic workload models are slightly worse than that from the model that uses true chronic workload (cf. 0.574), but all are dismal and comparable to an intercept-only model (cf. 0.5)

In Table 4, we presented the comparison between injured and uninjured players’ data for AL, AL divided by a fixed value corresponding to the average original CL (1510 AU), ACWR from 4 to 2 weeks and AL or ACWR for the two groups classified according to the median CL values. Differences between groups are presented with the corresponding 95% CI.

Table 4 Differences between injured and uninjured players

Crosstab showing the classification of the players’ data point according to four categories of AL and four categories of ACWR are presented in Table 5. Crosstabs are for the two groups based on median CL separately. Number of injuries for each ACWR category are also presented (for low CL group we also indicated the original AL category). Categories have been created as quartiles (values presented in Table 1). The within player relation between AL and CL is presented in Figure S2.

Table 5 Cross tabulation to show the reclassification of individual player data

4 Discussion

We systematically evaluated the ACWR concept by comparing it to an acute-to-random workload. When used in training load–injury models, the ACWRs creates remarkable statistical artefacts in the effect estimates. Here, we focus on the outcomes generated by these artefacts and provide some preliminary explanations. These findings demonstrate that when ACWR is used as an explanatory variable, results are always influenced by artefacts and artificial alterations. We have also shown that, depending on the characteristics of the sample (injury and data distribution), these artefacts can result in associations that can be statistically significant or compatible with increased or decreased injury risk.

The theory behind the use of the ACWR states that, when the AL exceeds the CL, an athlete is underprepared and hence at higher injury risk. The ACWR would indicate “both the athlete’s risk of injury and preparedness to perform” [7]. This concept was linked to the Banister model, which used two components: fitness (represented by the chronic load) and fatigue (represented by acute workload). The ACWR has also been linked to another similar metric, Total Stress Balance, also calculated using the fitness and fatigue components of Banister [5,6,7]. However, while these two reference models were additive, for reasons unbeknownst to the authors, ACWR relies on a ratio. Moreover, it was suggested that the negative effect of increasing load is greater when the CL is lower [28]. These models are conceptually different insofar as the Banister model investigates the effect of fatigue while controlling for fitness, while ACWR implies the absolute effect of fatigue changes with fitness.

The ACWR approach can be reframed similarly to the Banister model by way of stratifying observations based on CL. We tested the utility of these stratification procedures by reanalysing a previously published dataset. In doing so, we did not find any meaningful associations (Table 2). We also examined the independent effects of AL, CL, and their interaction; again, we did not find any meaningful associations with injury risk, suggesting that controlling for CL does not confer a meaningful advantage. These results, in addition to those of the original study, apparently supported this association since ACWR was the only variable found to be statistically significantly related to injury risk (Table 3). Stratifying by or controlling for CL does not seem to be advantageous, which suggests one of two conclusions: (1) ACWR appropriately captures the construct we are attempting to model (injury risk), or (2) CL does not contain any useful information. We performed further analyses to test these competing explanations.

If the proposed etiological theory of ACWR was correct, then dividing the individual AL by a contrived CL (i.e., a value not corresponding to the real CL of each player) should produce disparate results from ACWR, since it violates the underlying etiological theory. Therefore, we started by simply dividing the AL of all the players by the same value (i.e., the average CL value, 1510 AU), and this ‘contrived’ CL replaced the players ‘real’ CL. Rather than an ACWR, this is an “acute to fixed workload ratio”. Surprisingly, the OR was 1.95 (1.08–3.52), which is just slightly lower than the OR from the ACWR model (2.45, 1.28–4.71). Importantly, our analysis still suggested that the acute:‘fixed’ workload ratio performed similarly and still yielded a “statistically significant” association with injury risk. We repeated the analysis with other ‘contrived’ fixed values, and intuitively, by increasing or decreasing the denominator, the p values remained the same, while the estimates increased or decreased (see Table 3).

Therefore, we generated random CL samples with a similar mean and SD of the original data, which is the equivalent of dividing the AL of a player by the CL of another hypothetical random teammate. Since this has no logical basis, it can be conceived as a null model to assess the value of CL. Once again, we found associations between these contrived ACWR values and injury. From these data, we could call findings based on ACWR into question—the ACWR appears to simply be a linear rescaling of AL alone and provides no additional information. What is more, this finding calls into question theory behind the ACWR, which may have arisen as a post hoc theory from a statistically significant predictor rather than one born and hypothesized a priori from a deep theoretical framework (i.e. HARK-ing, Hypothesised After Results are Known). Undeniably, the results strongly demonstrate that CL does not reflect “preparation” of the players and confers no added value, as even randomly-generated, ACWRs with contrived CLs perform similarly to ACWRs with true CLs.

But why does this happen? Actually, the answer is quite simple. By dividing the numerator (AL) by a number, the researchers have just rescaled the numerator. The parameter estimates from the model correspond to a one unit increase in the explanatory variable. When rescaling, the unit is still one, but it now corresponds to a different quantity in the explanatory variable. The new unit of the ACWR indeed corresponds to the amount of the CL; i.e., 1 unit = 1 CL. If the CL is on average 2000 (AU, meters, etc.), the new estimate is now 2000 times the estimate corresponding to 1 in the original scale (e.g., 1 AU or 1 m). In other words, the scale of the parameter estimate must offset the rescaling of the numerator. Since measures like ORs (or relative risk, etc.) are multiplicative, the new effect is even greater. That is, the model estimates log(OR) as the parameter, which is exponentiated to obtain the OR. What is multiplicative on the log(OR) scale is exponential when brought back to the original OR scale, and thus, the OR is raised to 2000. Whatever the number γ of the denominator, the “new” OR will be the one of the numerator raised to γ. To draw a concrete example, if we have OR 1.001 for 1 m, but we want to refer the OR for 1 km, we can divide the original variable by 1000. The new OR will be 1.0011000. Simply, this transformation follows from the laws of logarithms and magnifies the magnitude of the OR estimated using AL alone; when predictor units change, parameter estimate units change accordingly.

While rescaling can be an appropriate procedure when motivated, the involuntary rescaling of AL has produced more impressive parameter estimates. Through simple transformations, a difference in AL will generate impressive effects when using the ratio. Indeed, in the sub-analysis performed to reflect previous studies (e.g., dichotomising player data based on a median split of CLs), we found appreciable differences in the AL between injured versus uninjured player data. As shown in Table 4, the injured players in the low CL group have greater ALs. As for the whole sample, there is a negligible effect of AL, even if statistically significant (ORs from 1.000 to 1.001). However, when the ACWR was used, the OR increases exponentially to 2.9 (1.6–5.1, Table 4). As further confirmation, an even greater OR was obtained by dividing the AL by 1510 AU compared to “real” CL (3.5, 1.7–7.3). Once again, the underlying etiological theory (chronic load “protective”) has nothing to do with the reasons for these results—rather, these results follow directly from the mathematics underlying the statistical model.

Dividing the AL by the CL not only changes the properties of the mean, but also the variance. Because CL is a temporally smoothed version of the AL, it has a lower variance, and thus, when using it as the divisor, it creates a variable with a lower coefficient of variation and smaller mean than AL alone. This results in a greater parameter estimate and also influences the p values and CIs. By generating random CLs with a mean similar to the original sample, but enlarging or restricting the SD, the point estimates, CIs and p value are changed compared to the AL alone (Table S1). Specifically, when the SD of the randomly generated CL data was lowered, the p values decreased and ORs increased. This can be also seen in Fig. 1b that shows the ORs generated by with different CL means and coefficients of variation (i.e. SD). While these results can be obtained using both coupled and uncoupled ACWRs, the coupled ACWR has additional issues. Since the numerator is included in the denominator, the variance of the ratio will inevitably be smaller. This additional artefact, caused by shrinking the SD, also explains why the use of the CL calculated using the average of more weeks (or days) exploits this artefact. Using a rolling average in the denominator creates a positive correlation between the numerator and denominator. The result of this is that large values of AL are attenuated by division by larger CLs, hence reducing the variability of the ratio [29].

4.1 General Problems with Ratios in Predictive Models

While the aforementioned consequences of the ratio transformation are sufficient to invalidate the ACWR and the etiological theory behind it, we highlight a further problem generated by transforming data into a ratio as it results in a reclassification. First, we note the differences in properties between multiplicative (ratio) and linear scales. Indeed, Curran-Everett and others [20, 30] have warned against the use of ratios and percentages in such analyses, in part because the values depend on the direction of the comparison. For example, if training load is reduced from 1000 to 800 (meters, AU, etc.), the relative decrease will be 20%, while if you increase from 800 to 1000, the relative increase will be 25%. These multiplicative changes are in contrast to additive ones, which are linear.

Second, because ACWRs are a proportion and thus sensitive to the denominator, individual players with low absolute ALs tend to have greater ACWRs, resulting in model miscalibration. For example, injured players with the lowest AL values tend to move in the higher category of the ACWR. This is evidenced by Table 5, where the data of high and low CL groups are presented separately to reproduce a typical dichotomisation of the data used in previous studies. Individual data with the lowest levels of AL belonging to the first quartile (< 1261 AU) moved into the higher ACWR categories (226 individual data, 57%); similar reclassification can be seen in the other categories. This shift was more prevalent in the low CL group since dichotomising by CL means also separating by AL. Lower AL values are more likely to produce greater ACWR values when AL increases since it represents a larger proportion of the denominator (CL). Indeed, there is an obvious relation between AL and CL (Figure S2). Similar subgroup analyses have been used to support the claim that high CL is protective while low CL predisposes athletes to injuries when “spikes” of workload occur: studies have reported a stronger association between ACWR and injury risk at low compared to high CL [28, 31]. Performing the same analysis in this sample (n.b. this was not done in the original publication), the ACWR was also found to be associated with greater injury risk for the low CL group only, thus seemingly supporting previous findings. While one may think that this reclassification is appropriate, since it appears to account for the increase “impact” of increasing load when the player is not “prepared” (i.e. low CL), we have already shown that this theory (protection or predisposition) does not stand since the CL itself magnifies and smooths the effect of AL effect estimates (i.e. just a rescaling number). If this theory held true (low CL predisposing), we would have found an association between AL or AL–CL change and the levels of CL (Table 2), and we would not have found similar results when using the contrived CL values. This did not occur. Rather, we also observed that each of the 12 injured players in the high ACWR group came from lower categories of the AL. Three were from the first quartile, one from the second and eight from the third. This example showed that the reclassification is artificial and the ratio gives more “weight” to absolute changes at low workloads and more “weight” when the workloads increase rather than when they decrease. This explains why this reclassification occurred more in the low than high CL group. Moreover, from a statistical theory standpoint, the “split”-based analysis implies an interaction between ACWR and CL; however, not only does this simplify to AL alone, but the OR 1.0 (1.0, 1.0) for their interaction suggests it is not informative. Although this adjustment does not have as large an impact as rescaling, it still biases the results and creates artificial differences between injured and uninjured. Reclassification of 12 injured players out of 36 in the high ACWR has a clear effect on the results and, as in previous studies, also on the calculations of other figures such as injury rate. While in the past, similar results have been used to support the predisposing effect of low CL. However, the evidence and logic we present suggest this is, instead, another result of the combination of statistical artefacts and noise added by the ratio, also causing re(mis)classification.

The ACWR creates artefacts generated by the combination of the aforementioned factors altering and magnifying the effects of the AL (numerator). Depending on the relation between AL and injuries, the effect estimates are increased and, depending on the distribution of the denominator, they are further inflated. Therefore, the ACWR values calculated from different smoothing averages (e.g., 2–4 weeks) with the highest value and lowest SD (Fig. 1b) will magnify the estimates and influence the p values and CIs. The use of a ratio and further reducing the variance of the explanatory variable using other smoothing strategies—such as the EWMA, as suggested and used in some studies—suffers from the same problems. In addition, they also are not conceptually superior since the starting idea of a CL–AL interaction is not supported, but rather is just an artefact (whatever the mathematical “strategy” to calculate the “fatigue” and “fitness” components). Studies showing the superiority of ACWR based on EMWA, or the “equivalence” between coupled and uncoupled, confirm that these methods produce the same artefacts [14, 16, 32]. Similarly, explorative studies trying to find the best combination of AL and CL time windows to “optimize” parameter estimates may just be optimizing these artefacts (involuntary p-harking) [17]. Hold-out samples should be used to evaluate the effects of optimization, and prediction/model fit should be assessed rather than parameter estimation alone. Similarly, most arbitrary pre-analytical data “treatment” also amplifies these artefacts by, for example, changing the variance of the AL, CL, and their ratio (e.g. deleting CL below 1 or 2 SD, single imputation, etc.) [8, 17, 28].

It may seem from the arguments we put forth that the “key” metric to focus on is the acute workload. Although we will not address this topic in detail here, it is not so straightforward. Simply comparing the AL (or any other potential factor) of injured versus non-injured is not sufficient as the studies from which these data come are prone to several potential biases well known in epidemiology [33, 34]. Therefore, it is not a question of “statistical analysis” or creating new metrics calculated from each other, but rather design and conceptually selecting explanatory variables based on a proper conceptual and theoretically sound framework, all while controlling for confounding factors. Moreover, it is essential that the predictive performance of these models be assessed out-of-sample. If these models are predictive of injury in hold-out samples, experimental approaches to manipulating the predictors (e.g., acute load) should be employed to assess the causal nature of the relationship. This approach is essential for causal inference, which is arguably the tacit aim of these studies. Indeed, as evidence of this causal interpretation, other than the overinterpretation of the studies themselves, we now have international guidelines and consensus suggesting how to manipulate these prognostic factors (training load metrics) to reduce the injury risk, which assumes a causal effect (i.e., a perturbation in x results in a change in y). Importantly, this assumption has been made in the complete absence of any attempts to estimate causal effects and based on results determined by artefacts due to data transformations. The interpretation should always be based on and commensurate with the real nature and goal of the study (descriptive, predictive, causal).

4.2 Predictive Value of ACWR and Acute Workload

Although not the primary purpose of this work, we briefly explored the in-sample predictive value of the ACWR. Despite having a statistically significant and large OR, ACWR confers no predictive advantage with respect to injury risk. Proper scoring rules are virtually identical between ACWR and an intercept-only model (both Brier scores = 0.0351), and the ACWR model has a slightly greater c-statistic than the intercept-only model (0.574 vs. 0.5). In the ACWR model, the average probability of injury of those who were injured was 0.039. We replicated the aforementioned analyses using AL-only, and the results were identical, with the exception of the c-statistic, which went from 0.574 to 0.544. From a predictive standpoint, when used in isolation, neither AL nor ACWR contain useful information, even when assessed in the training sample.

4.3 Additional Considerations to the Study

For this study, we reproduced the same analysis of the original investigation, which is similar to other publications using ACWR [35]. Similarly, we also classified and presented the data according to previous literature. However, this does not mean that the authors of the current study agree with or endorse these analyses and study designs. For example, even presenting descriptive data (and categorisations) using non-independent values is questionable. Similarly, we repeated the analysis using logistic regression (with similar results but with data not shown) without taking into account the repeated measures just to see whether the same results could be obtained, even if expected. Therefore, we have simply used previously published data and analyses to show the problems created by the ACWR ratio and to show the lack of validity of the theory behind this metric. Similarly, we used the AL and CL calculated using a measure of internal load (session RPE), but the results can be applied to any proxy measure of training load (internal or external).

While we explained that the ratio is just a rescaling technique, we highlight that rescaling the explanatory variable is not always a wrong practice. For example, rescaling grams to kilograms (dividing by 1000) to examine the effect of body mass changes on risk would make sense and is probably advisable. However, it is wrong in this context firstly, because it was involuntary and ACWR never proposed to rescale (but rather as a normalization procedure). Secondly, because there is no apparent clinical or physiological rationale for understanding how to rescale this kind of explanatory variable (AL), even more so considering this metric is applied to several measures of training load (e.g. how can the number of balls bowled in cricked be rescaled?).

5 Conclusions

We are confident that most of these errors that have been made in previous studies were unintentional. It is also reasonable that the authors believe that the reported relation between training and injury was authentic, and that the etiological theory created to support the ACWR and its components was rational. However, as the ACWR model fitted popular beliefs so well, it became a self-fulfilling prophecy and lowered scientists willingness to critically evaluate the construct. The selection of candidate prognostic factors may benefit from explorative studies, but we urge scientists to avoid procedures that may produce statistical artefacts and that focus on the dichotomization of effects (e.g., null hypothesis significance testing). In the current study, we have demonstrated using published data and simulations that:

  • the etiological theory developed to explain the relation found in some studies between ACWR and injury risk is not supported;

  • the ratio is a rescaling procedure, exponentially magnifying the effect of the AL;

  • a ratio using averages of the numerator as the denominator will have a lower SD, such that a one unit increase in the new explanatory variable will correspond to a higher ORs;

  • the ratio also causes artificial and non-physiologically justified reclassifications, further influencing the results;

  • neither ACWR nor AL contain useful information for predicting injury;

  • the findings based on ACWR reported in the literature are therefore all affected by artefacts that, depending on the data characteristics, resulted in negative, positive, or no associations (in this dataset positive associations).

5.1 Practical applications

The ACWR and its components should be dismissed. Moving forward, time should be focused on selecting and identifying appropriate proxy measures and developing reasonable causal assumptions. Creating new metrics without conceptual reference models and relying on statistical significance, especially for prediction, should be avoided. The results of previous studies should be reconsidered, and authors and editors should make efforts to correct the erroneous messages that were disseminated, and their associated theoretical frameworks should be revised. Finally, international and national organizations and athlete management systems that base their recommendations on the results of these studies should revise their recommendations, acknowledging these artefacts and lack of predictability.