Introduction

Italian physiologist Angelo Mosso is credited with suggesting that exercise fatigue has two components: “The first is the diminution of the muscular force [i.e., peripheral fatigue]. The second is fatigue as a sensation [i.e., central fatigue]. That is to say, we have a physical fact which can be measured and compared, and a psychic fact which eludes measurement” (Mosso 1904). Indeed, the elusiveness of measuring and defining central fatigue still confounds the field of neuromuscular physiology. Clearly, declining motivation, increasing pain or discomfort, and even boredom can curtail the volitional drive necessary to continue with the exercise task at hand (Marcora 2008; McMorris et al. 2018; Kayser 2003), while heightened motivation or a placebo effect can augment it (e.g., Benedetti et al. 2005). It is possible that conditions such as hypoxia (Amann et al. 2006) or neurotransmitters such as serotonin (Newsholme et al. 1987) could affect all facets of central fatigue. However, from a neuromuscular perspective, it is not clear whether or under what conditions there is an actual decline in the physical capacity of the motor-cortex and spinal cord to keep on producing and propagating the pre-fatigue levels of motor impulses. While those ‘psychic’ factors are bona fide aspects of centrally determined changes in functional capacity, they ought not be regarded as part of true, physiological neuromotor fatigue, i.e., decreased ability to generate or propagate motor impulses between the motor-cortex and motor endplate. The distinction between ‘psychic’ and central physiologic factors, however, is easier discussed than proven, but is exemplified by the following representative studies. In sustained 60-s 100%MVC efforts, Bigland-Ritchie et al. (1978) found that some participants could briefly produce force spikes over and above the deteriorating ‘maximal’ volitional force. Similarly, in 2 min of sustained MVC efforts, following ~ 50 min of 60%VO2max exercise to exhaustion at 40 °C, Nybo and Nielsen (2001) reported that MVC could not be maintained at the level attained by a non-hyperthermic control group that exercised for a similar duration without reaching exhaustion. However, in performing 40 intermittent MVCs (2 s, 3 s rest), immediately following the hyperthermic exercise session and before a recovery of core or muscle temperature could take place, there was no performance difference between the two groups. Both studies demonstrate or strongly suggest that brief concentration of cognitive effort (i.e., motivation) can momentarily overcome pain, discomfort, or other ‘psychic’ inhibitions to volitional effort. Another aspect of ‘psychic intervention’ in neuromotor activity is provided by a Benedetti et al. (2004) study which demonstrated how the introduction of a placebo can affect neuronal firing rate. These studies highlight not only the difficulty of defining what the central fatigue concept really entails, but also how psychologically ‘contaminated’ the assessment of both central and peripheral fatigue might be.

Central fatigue has been extensively researched and the most common method used for its assessment has, by far, been the calculation of volitional activation (VA) before, often during, and at the end of fatiguing muscular exertions, by means of the interpolated twitch technique (ITT). However, very few studies have even questioned whether and how central fatigue can be measured (e.g., Contessa et al. 2016; Neyroud et al. 2016). These questions have been indirectly addressed via pointing out of major shortcomings of the ITT method upon which central-fatigue determination is based. These include low sensitivity at maximal contraction intensities, erroneous presumption of response linearity, lack of generalization among contraction types, high variability, etc. (de Haan et al. 2009; Folland & Williams 2007; Herzog 2009).

The underlying premise of the central-fatigue determination method is that: (1) the ITT-based determination technique (VAITT) is a valid and reliable means for assessing VA; and (2) the validity and reliability of VA assessment under fatigue conditions is unchanged relative to fresh, non-fatigued muscles. While the weaknesses of the first premise have been raised by others before, we aim to also question the second premise which, to our knowledge, has never been acknowledged and seems to be based on faith rather than facts.

Importantly, the outstanding issue is not whether central fatigue exists. Based on the ITT methodology, a number of studies have reported development of central fatigue during prolonged strenuous muscular activities such as long-distance running (Martin et al. 2010; Place et al. 2004a; Saldanha et al. 2008; Temesi et al. 2014), cycling (Thomas et al. 2015, 2016), or sustained isometric contractions (Löscher et al. 1996; Neyroud et al. 2012; Zghal et al. 2015). Using the conventional ITT method, Thomas et al. (2016), for example, found central fatigue to play a much smaller role in short, high-intensity exhaustive exertions (~ 4–6 min) than in longer exhaustive endurance efforts of ~ 11–66 min. Eichelberger and Bilodeau (2007), however, who reported marked central fatigue in prolonged, > 6-min sustained 30%MVC contractions, could not detect it in three shorter (~ 1–3 min) and more intense (45–75%MVC) contractions. Eichelberger & Bilodeau further noted that “central fatigue has been suggested to play little role in the development of fatigue during relatively brief (~ 60 s) continuous, high-intensity exercise, where peripheral factors contributed the most to the observed fatigue (Bigland-Ritchie et al. 1983, 1986; Bigland-Ritchie and Woods 1984; Bilodeau et al. 2001; Bilodeau 2006)”. Apparently, as will also be shown later, the influence of peripheral factors is difficult to disentangle from central fatigue as determined by the ITT-based method. It appears that we are currently ill-equipped to tease out the motor-cortical from the “psychic”, or peripheral from central fatigue factors, or even verify central fatigue’s very existence. It can thus justly be asked: “Does the notion of central fatigue fulfill a need?” (Contessa et al. 2016).

Numerous technical and methodological issues appear to have contributed to ambiguities and inconsistencies in findings and have been pointed out and addressed to various degrees, most recently by Place & Millet (2020) who, notably, raised the issue of the partial recovery taking place in the interval between exercise termination and the VAITT testing which is taken to reflect the degree of the exercise-induced fatigue. This review will focus on fundamental and conceptual issues associated with the conventional central-fatigue assessment method and the premises it is based on. Most glaringly paradoxical and counter-intuitive appear to be findings that claim greater central fatigue in children and endurance athletes. These will be scrutinized at the end of this review.

Validity and reliability of ITT-based assessment of volitional activation

Merton’s 1954 paper, “Voluntary Strength and Fatigue”, is widely credited with introducing the ITT. Merton employed > 4 min of adductor pollicis sustained MVC efforts that reached < 20% of initial MVC at termination. Despite the sharp decline in muscle force, EMG action potential amplitudes did not decrease, prompting Merton to note that “fatigue [as per his experiment] is not due to neuromuscular block [i.e., to central fatigue]” and that “neuromuscular block is not important in the fatigue of a volitional tetanus”. On the other hand, Schillings et al. (2003), testing the elbow flexors, reported that the change in “central activation failure” (i.e., central fatigue) between 15 s and the end of 2-min sustained MVC contractions, was 21.7 or 12.1%, depending on calculation method. However, also testing an elbow flexor (biceps brachii) in a similar 2-min MVC contraction, Gandevia et al. (1996) found central fatigue of only ~ 9% which hardly changed in an extended 3-min contraction. In utter contrast, after ~ 2 min of sustained 60%MVC to exhaustion, Eichelberger and Bilodeau (2007) could not detect any central fatigue.

These few examples illustrate the large variability of ITT-based determination of VA and central fatigue and raise questions regarding both its validity and reliability. The likely reasons for that are discussed below.

VA and its linearity premise

A major weakness of VA determination is the presumption of linearity of the superimposed twitch (SItw)–%MVC relationship, despite the fact that the curvilinear nature of the relationship has been repeatedly demonstrated (Belanger and McComas 1981; Bigland-Ritchie et al. 1986; Behm et al. 1996; Folland and Williams 2007; Herbert and Gandevia 1999; Kooistra et al. 2007). Nevertheless, both the central activation ratio (CAR = MVC/MVC+SItw) and the now-prevalent VAITT formula, which also includes the potentiated resting-twitch (Rtw) [VAITT (%) = (1−SItw/Rtw) × 100], are founded on that linearity premise as depicted by Merton in 1954. While this does not ‘pull the rug’ from under the method’s core validity, it affects its accuracy (e.g., Folland & Williams 2007). Non-linearity could well mean different curves for different populations (e.g., male, female, children, trained, untrained), or conditions such as fatigued vs. non-fatigued. However, the linearity-based method has to-date been applied to all, possibly obscuring important differences.

VA’s determination validity

Aside from MVC, the ITT method Merton is credited for, employs only the SItw to deliver a measure typically referred to as CAR (Kent-Braun and Le Blanc 1996). To the best of our knowledge, the VAITT formula was first introduced by Thomas et al. (1989), based on a concept brought forth by Bigland-Ritchie et al. (1986), who also tied it to peripheral and central fatigue. Substantive criticism has been leveled against ITT’s validity for VA determination. For example, De Haan et al. (2009) highlighted VAITT’s low sensitivity at high intensities and the issue of non-linearity of the SItw–%MVC relationship. Herzog (2009), on the other hand, stressed ITT’s high variability and the large discrepancy between the artificially induced and the volitional, biologically limited stimulation frequencies and intensities.

Reliability of the interpolated twitch technique

Investigating VAITT’s reliability, Allen et al. (1995) performed VA determinations in each of ten elbow-flexion MVCs, repeated on five separate days by five healthy individuals (50 tests per person). They found VAITT’s 95% confidence interval to cover a wide range of 0.61–0.98, as well as large differences in the extent of individual variability. Also, while SItw variability was low, large inter-individual Rtw variability was reported. Todd et al. (2004) reported poor VAITT reproducibility (reliability) for the average VAITT across multiple testing sessions (ICC = 0.60), which the authors attributed to the small range of VAITT values (95% confidence interval ≤ 6.3%). Cooper et al. (2013) tested VAITT reliability in 10%MVC contraction increments. Their test–retest reliability ICCs rose from 0.51 at 10%MVC to 0.84 at 100%MVC, explaining only 26–71% of the observed variance. This is highly consequential because it means, first, that even at ~ 100%MVC, which is the typical, recommended testing intensity (e.g., Taylor and Gandevia 2008), VAITT and the derived central fatigue determinations are not very reliable. Second, during fatiguing tasks of sustained MVCs, actual contraction intensities may be allowed to fall to 50%MVC and below (Schillings et al. 2003; Taylor et al. 1999), where the VAITT reliability may be considerably lower. Rozand et al. (2017) have demonstrated an ageing effect as well. They compared old vs. young adults and found that although mean VAITT values were similar, they were significantly more variable in older adults, at both isometric and maximal dynamic contractions.

Expectedly, calculated VA typically increases with the tested contraction’s intensity, not only under typical isometric conditions but in dynamic contractions as well (Malloggi et al. 2019). Isometric data, reported by Cooper et al. (2013) and depicted in Fig. 1, demonstrate this relationship particularly nicely as the VA–%MVC relationship hugs the identity line up to > 80%MVC. However, at 90%MVC the corresponding VA is at 87.6%MVC and then further declines to only 84.2% at 100%MVC. This general phenomenon has been reported by many (Belanger and McComas 1981; Herbert and Gandevia 1999; Folland and Williams 2007; Kooistra et al. 2007; Todd et al. 2004) and reflects the greatly diminished VA reliability at or near the 100%MVC intensity, where VAITT is typically tested. With a somewhat different perspective on the same issue, Behm et al. (1996), investigated the SItw/Rtw ratio in a polynomial relationship with %MVC at different contractile intensity ranges. The authors used these data to predict the actual MVC. Expectedly, they found the prediction error to be highest when partial testing ranges were used (e.g., 0–60%MVC). However, even at the full 0–100%MVC testing range, the predicted MVC still under-estimated the directly determined MVC by 5.8% (Fig. 2).

Fig. 1
figure 1

High correlation and agreement between measured contraction intensity (%MVC) and calculated VA (dashed line = line of identity). Depicted from data reported by Cooper et al. (2013)

Fig. 2
figure 2

VA-predicted MVC from ITT-based VA estimations performed at different ranges of contraction intensities. Note the non-linearity of the predicted-MVC response and, particularly, the prediction error (5.8%) even when testing the entire intensity range. Depicted from data reported by Behm et al. (1996)

Kooistra et al. (2007) investigated VAITT and SItw in relation to torque production and showed both day-to-day variation and SItw–torque incongruence at the higher contraction intensities, due to the aforementioned low VAITT reliability at the higher intensities and the relationship’s non-linearity. The authors concluded that the VA estimate “probably represents a considerable overestimation of the subjects’ ability to maximally drive their quadriceps muscles”.

Suter et al. (1996), analyzed the SItw/Rtw ratio in nearly 800 contractions between 5 and 100%MVC. The variability was so high that at ~ 60%MVC, for example, the ratio ranged between zero and nearly 100% (1:1 ratio), whereas according to the ITT method, it should have been ~ 40%.

Delivering electrical stimulations at maximal efforts and accurately determining the force increments is a non-trivial challenge that may introduce an additional level of uncertainty to VA assessment. Herbert and Gandevia (1999) noted that “at near-maximal contraction intensities increases in excitation produced only small changes in interpolated twitch amplitude” and, therefore, “twitch interpolation may not provide a sensitive measure of motoneuronal excitation at near-maximal forces”, i.e., where VA is typically determined. In his landmark review of spinal and supraspinal factors in muscle fatigue, Gandevia (2001) cited this as interfering with the SItw–%MVC relationship but did not fault VAITT as a valid tool for central fatigue determination.

Effects of stimuli number and frequency on the superimposed twitch

Various combinations of the number, frequency, and duration of stimulations have been used, adding yet another level of variability and confounding the interpretation and comparison of VAITT findings.

Allen et al. (1998) showed > twofold increase in SItw amplitudes using doublet (100 Hz) stimulation compared with a single stimulus. A quadruplet (50 Hz) stimulation increased that value to ~ 2.7-fold. Triplets (Kooistra et al. 2007), quintuplets (Behm et al. 1996), and even octets (Bigland-Ritchie et al. 1986; Thomas et al. 1989) have also been employed. While the signal-to-noise ratio tends to increase with the number of stimulations, Allen et al. (1998) showed only a small and statistically insignificant increase in VA-test sensitivity, Behm et al. (1996) could find none, while Suter and Herzog (2001) found SItw amplitude variability to decrease with increasing number of stimulations. Kooistra et al. (2007) claimed triplet stimulation at 300 Hz to increase test sensitivity and not only the signal-to-noise ratio. No comparisons to other stimulation types were made.

The question of what the best frequency is arises whenever multiple stimulations are used (e.g., Allen et al. 1998). Stimulation frequencies of 100 Hz (10-ms intervals, typically in doublet form) are generally regarded as optimizing SItw size and signal-to-noise ratios and are the most widely used (e.g., Behm et al. 1996; Neyroud et al. 2016; Suter and Herzog 2001). Reservations about the use of supra-physiological stimulation frequencies have, however, been raised. Herzog (2009) argued that the use of supramaximal stimulations, in terms of both frequency and intensity, markedly exceed the physiological force generating capacity because the volitional “driving frequencies of motoneurons are well below those that provide maximal force”. This discrepancy likely negatively affects VA’s determination reliability.

Herzog’s view appears to have been supported by Gandevia et al. (2013). On a background of 60-s sustained tetanus induced by the physiologically more-congruent 30-Hz stimulations, additional stimulations were superimposed every 5 s at 5 to 30 ms intervals (200–33 Hz) following the last 30-Hz background stimulus. All the higher-frequency (200–50 Hz) stimulations were found to induce greater twitch (SItw) amplitudes than those produced by the 33-Hz stimulations (more on this under Unexpected Increments and Decrements below).

VA-determination reliability is not synonymous with its sensitivity and it cannot be expected to vary if SItw and Rtw are affected proportionally (i.e., unchanged SItw/Rtw ratio) under different number and frequency of stimulations. This, however, is apparently not the case. Behm et al. (1996), for example, reported that in many of their participants, at 20% of MVC, “the amplitude of the superimposed doublet was equal to or greater than the potentiated doublet”, a phenomenon that is not compatible with the rationale behind VA’s determination method and which they did not observe with single or quintuplet stimulations. Similar findings were reported by Oskouei et al. (2003), using 125-Hz doublet stimulations.

Thus, as long as different stimulation numbers and frequencies cannot be shown to maintain similar SItw/Rtw ratios, they have the potential to affect not only VAITT’s determination reliability but possibly its validity as well.

The resting (Control) twitch

Although all MUs are presumably activated in eliciting Rtw, the evoked force is typically only 25–40%MVC (e.g., Bigland-Ritchie et al. 1986; Garcia-Vicencio et al. 2015; Lampropoulou et al. 2012; Ratel et al. 2015). The reasons for this are several and include incomplete summation of individual MU twitches from a single elicited contraction (e.g., Bawa and Stein 1976; Fuglevand et al. 1993), force absorbed in overcoming the muscle’s series elastic component (Wilkie 1956), possible tendon slack (Muraoka et al. 2004), and shorter than optimal muscle length (Babault et al. 2003; Hansen et al. 2003; Place et al. 2004b; Rassier 2000) which reduces intramuscular pre-stretch and increases muscle laxity. For calculating VAITT, a potentiated Rtw (i.e., measured after one or more conditioning MVCs to induce Post-Activation Potentiation; PAP) is typically used as a control twitch to provide a presumably more sensitive peripheral control measure in assessing VA than an unpotentiated Rtw (Kufel et al. 2002; Folland & Williams 2007). Methodological differences, however, can introduce considerable variability since the potentiated Rtw amplitude, and consequently VA estimation, are known to vary depending on force level and duration of the conditioning contraction (Vandervoort 1983), number or frequency of preceding contractions or stimulations (e.g., Close and Hoh 1968; Taylor-Burt et al. 2020), and stimulation timing following the conditioning contraction (Madrid et al. 2020). As PAP is greater in muscles of higher type-II fibre composition (e.g., Hamada et al. 2000a), Rtw would also be expected to be relatively higher in those muscles. This may have been borne out by Bigland-Ritchie et al. (1986), who found quadriceps (Quad) Rtw (28.6%MVC) to be threefold larger than the corresponding 9.4%MVC recorded in the soleus (SOL). Belanger et al. (1983), on the other hand, found that the SOL’s MVC-normalized Rtw was 2.2- to 5.2-fold larger than that of tibialis anterior (TA) (Fig. 7). Also, Hamada et al. (2000a, 2003), found Rtw in their low-type-II groups to be 27–30%MVC, but only 15–16%MVC in their high-type-II groups. In testing endurance- and power-trained women, however, Pääsuke et al. (2007) found knee-extension (Quad) Rtw values of only 14 and 16.2%MVC, respectively. The Rtw of untrained women was slightly higher, rather than lower, at 17.9%MVC. Thus, Rtw’s relationship to muscle composition appears inconsistent and not well understood (see also below under ‘Muscle Type and Fibre Composition’).

Rtw decreases during fatiguing muscular exercise (Bigland-Ritchie et al. 1986, Bontemps et al. 2019, Hamada et al. 2003, Nordlund et al. 2004, Ratel et al. 2015, Schillings et al. 2003, Thomas et al. 2015). This largely parallels the corresponding MVC decline and is likely due to the same peripheral factors. This is probably also why Rtw is generally regarded as a robust measure in the unfatigued muscle (Allen et al. 1995; Morton et al. 2005; Todd et al. 2004). Testing torque and fatigue of both MVC and Rtw at different knee angles, Rassier (2000) found that at the optimal knee angle (60° flexion), Rtw fatigue (56%) matched that of MVC (55%) and demonstrated a negative correlation between Rtw fatigue and its potentiation. Dekerle et al. (2019), found moderate Rtw reliability of ICC = 0.71 at rested conditions which was practically unchanged following fatiguing exercise (ICC = 0.69). For comparison, the corresponding ICC values for SItw were much lower at only 0.39 and 0.14.

While fatigue may affect MUs of all types, it is more pronounced in the more fatigable type-II MUs. Hamada et al. (2003), for example, conducted a 16-MVC fatigue test and reported 17.4% Rtw fatigue in their predominant-type-I group and twice that, 33.7%, in the predominant-type-II group. Similarly, children, who are thought to have lower type-II fibre composition than adults (Jansson 1996; Lexell et al. 1992) and activate less of their type-II MU pool (Dotan et al. 2012; Pitt et al. 2015; Woods et al. 2019), have also been shown to have lesser Rtw fatigue (Bontemps et al. 2019; Hatzikotoulas et al. 2014; Piponnier et al. 2019; Ratel et al. 2015, Streckis et al. 2007).

Thus, although Rtw’s change with ensuing fatigue may be relatively predictable, its dependence on PAP and its relationship to muscle composition may not be fully resolved and are a source of variability that can confound the prediction of Rtw behavior in different individuals and conditions. Rtw is thus subject to peripherally induced changes that could affect VA and central fatigue determinations to an extent that may be difficult to predict and control.

Transcranial magnetic stimulation (TMS)

TMS has seen increasing popularity in assessing cortical volitional activation fatigue and many researchers have chosen to employ it in lieu of the more direct nerve or muscle stimulation for VA assessment (Brasil-Neto et al. 1994; Cadigan et al. 2017; Temesi et al. 2014; Todd et al. 2003). The difference between the two approaches is not limited to modality choice. Rtw is considered greatly under-estimated by TMS (e.g., Todd et al. 2003). Consequently, two different Rtw substitutions have been employed. One is the backward extrapolation of the TMS SItw values, typically via linear regression of TMS-obtained SItw values at 100, 75, and 50%MVC contractions (e.g., Dekerle et al. 2019). Clearly, the problem is the obtained value being an estimate rather than a direct measure of Rtw (Dekerle et al. 2019). The other approach involves TMS replacement by direct muscle or nerve electrical stimulation (e.g., Cadigan et al. 2017), thereby using two different-response modalities within a single framework (i.e., the VA formula).

Another significant issue with TMS is its lower target specificity and stimulation intensity. Increasing stimulation intensity to maximize the SItw response also increases antagonist activation, whereby compromising VA’s determination reliability (Todd et al. 2016). It is not clear whether TMS is inferior or potentially superior to direct nerve or muscle stimulation in terms of validity and reliability of VA and central fatigue determinations, but problems such as those pointed to above could further confound TMS-based VA assessment, particularly when TMS and nerve/muscle data are compared or used interchangeably.

One issue is outcome similarity. For example, Cadigan et al. (2017) showed TMS to under-estimate SItw and maximal VA relative to motor-point electrical stimulation, while Nuzzo et al. (2021) found knee extensors’ activation deficit (AD = 100–VA%), determined by direct nerve stimulation, to be 22% higher than the TMS-determined AD. It could not, of course, be determined which VA estimation was more valid. Temesi et al. (2014) tried to quantify the difference between the two methods in a different way and found TMS-determined VA to account for only 29% of the observed VA variance as determined by electrical nerve stimulation. In a similar study, Zghal et al. (2015) found the post-exercise TMS-determined VA to decrease ~ threefold more than that derived by nerve stimulation. Calculated from data reported by Todd et al. (2003), TMS-obtained SItw values in the fresh biceps brachii were ~ 1–3.5 times larger than the corresponding nerve-stimulated ones, a range that narrowed to ~ 1.7–3.1 in the fatigued muscle (Fig. 3). Moreover, the TMS/nerve-stimulated SItw ratio was higher in the fatigued than in the fresh muscle below 60%MVC, but lower beyond that point.

Fig. 3
figure 3

Changes in the ratio of the biceps-brachii superimposed twitch obtained via TMS relative to that attained by direct motor nerve stimulation (TMS/nerve), as a function of contraction intensity in fresh and fatigued muscle. Note higher TMS-derived SItw values < 60%MVC but lower values > 60%MVC in the fatigued vs. fresh muscle. Derived from data reported by Todd et al. (2003)

Another issue, beyond differences in absolute or relative twitch values, is that of reliability. Dekerle et al. (2019) challenged the face validity of the TMS-determined VAITT in an intermittent contraction protocol. They did not compare TMS to nerve or muscle stimulation but showed that while SItw, Rtw, and VA reliabilities were acceptable pre-exercise (in the fresh, non-fatigued muscle), they were much weaker post-exercise, under fatigue conditions, particularly for SItw. Héroux et al. (2015), faced up to TMS’s alleged low reproducibility by conducting a comprehensive survey among 153 researchers who had used TMS and found that approximately 50% of them had been unable to replicate published TMS findings.

VA assessment, whether via VA or CAR, produces different and inconsistent values already in the fresh muscle (e.g., Behm et al. 1996; Bilodeau 2006; Boerio et al. 2005). This, and the fact that VA’s reliability and validity can be challenged on numerous grounds, means that its accuracy cannot be ascertained and is never quite known even in the fresh, unfatigued state. This has led De Haan et al. (2009) to declare that “The interpolated twitch does not provide a valid measure of the voluntary activation of muscle”. Under fatigue conditions, the problem is only further confounded, as will be described below.

VAITT as a means for assessing central fatigue

The validity of VAITT as the means for assessing central fatigue is based on the premise that during a fatiguing task, the SItw/Rtw ratio would increase only, or mostly, due to central fatigue. If no central fatigue develops in an otherwise fatiguing exertion, the SItw/Rtw ratio ought to remain constant under the current method so as to maintain the initial VA (Fig. 4a). This condition, however, is not met in the bulk of the existing evidence, as will be shown below.

Fig. 4
figure 4

A Schematic representation of the fixed SItw/Rtw ratio needed for keeping volitional activation (a) from changing during a fatiguing exercise (assuming only peripheral fatigue). In this example, a constant 1:10 ratio (6:60 to 1:10) produces a fixed 90% VA throughout the fatiguing exertion. B Same as 4A except that the SItw amplitude declines from 6 to 2, rather than 6 to 1%MVC). This small, 1%MVC change was sufficient for VA (b) to decline 10% (from 90 to 80%). C SItw is the same as in A, but Rtw changes from 60 to 4 rather than to 10%MVC (just a 6% additional decline). This small Rtw change is enough for a 15% decline in the final VA, from 90 to 75% (c)

VAITT variability under fatigue conditions

Studies comparing VAITT under fresh and fatigued conditions typically found VAITT to be much more variable (and presumably less reliable) in the fatigued state. For example, Dekerle et al. (2019) found VAITT’s typical error of measurement (as % of the mean) to be only 2.5% in the fresh state, but 4.8-fold greater (11.9%) under fatigue. Similarly, Goodall et al. (2017) found VAITT’s CV to be only 2.2% prior to a 120-min soccer game, increasing to 5.2, 14.6, and 17.8% at half-time, full time, and at the end of extra-time, respectively—more than eightfold increase overall. The same group found VAITT’s CV to increase from ~ 6.4% before to 16.2% following 2 min of sustained MVC (Goodall et al. 2009). Thomas et al. (2015) tested VAITT (and central fatigue) prior to and following 4, 20, and 40 km cycling time trials, and reported a mean CV of ~ 8.0% prior to each of the efforts, which increased to 15.3, and > 18%, after the 4-km and the two longer-distance efforts, respectively. On the other hand, Place et al. (2007) found similar moderate to high MVC, SItw, Rtw, and VA reliabilities both before and following fatiguing exertion (2-min sustained MVC), but random error was ~ 1.5–twofold greater following fatigue.

It appears likely that much of the fatigue-related VA variability, as well as possibly the associated central fatigue or part thereof, were due to strictly peripheral factors. Neyroud et al. (2016), for example, manipulated strictly peripheral fatigue variables that significantly changed VAITT determinations and prompted the authors to conclude that “muscle fatigue can confound the quantification of central fatigue using the ITT”.

Variability of the superimposed twitch

For SItw to reliably serve in assessing VAITT and central fatigue, it must change in a systematic, predictable manner. However, while it is generally known to decrease with increasing contraction intensity, there is no way of apriori knowing precisely how the SItw would change with or without fatigue. Methodological issues add to this uncertainty, as the execution of true maximal efforts, where SItw is typically measured, is challenging and requires extensive training on the part of the researcher as well as participant motivation, particularly during fatigue and increased perceived effort. Also, as mentioned earlier, any delay in performing MVC after a fatiguing task allows for partial recovery that can lead to additional measurement variability (Place and Millet 2020).

As previously mentioned, Dekerle et al. (2019), found the reliability, particularly that of SItw, to be already low at pre-exercise conditions (ICC = 0.39) and much lower (ICC = 0.14) following fatiguing exercise (the corresponding Rtw values were 0.71 and 0.69, respectively). Based on a simulation model, Contessa et al. (2016) reported SItw variability in an intermittent fatiguing MVC protocol, leading to increasing VA in some contractions and decreasing VA in others. Suter & Herzog (2001) speculated that SItw variability may stem from the stochastic nature of the process, where the timing of elicited contraction, relative to the ongoing volitional MU impulse trains, may significantly change the observed SItw force. In a highly instructive study from the same laboratory, Oskouei et al. (2003) explored SItw variability in determining AD in maximal and submaximal contractions. Due to SItw’s large variability, a 6-%MVC range (94–100%MVC) produced AD values that ranged from 1 to 70%. The authors also found that the negative AD–%MVC correlation was statistically significant in only 5 of the 12 participants whose AD exceeded 20%. In conclusion they stated that “The variability of the STF [SItw force] during maximal effort contractions remains a mystery. This result is disappointing because maximal effort contractions are the most frequently used contractions”.

An illustration of SItw’s unpredictability can be gleaned from Todd et al. (2003) data. They measured elbow flexor’s SItw in relation to %MVC, via both TMS and electrical nerve stimulation, performed in two groups, one tested on a fresh and one on a fatigued muscle, with several individuals participating in both groups. The TMS results are depicted in Fig. 5. SItw expectedly decreased with increasing contraction intensity under both muscle conditions, but the relationships were irregular, conforming neither to the Merton (1954) linear model, nor to any curvilinear model, such as the 2nd order polynomial model suggested by Folland and Williams (2007) and others. Also, the SItw–%MVC regression lines, of the fresh and fatigued muscles, descended in parallel to each other. This contrasts with the expectation, under the prevailing central fatigue methodology, that with increasing %MVC, SItw of a fatigued muscle would increase more relative to that of a fresh muscle. Consequently, one would expect the SItw–%MVC regression lines of the two muscle conditions to be converging toward the higher intensities, as shown for example, by Dekerle et al. 2019 and Todd et al.’s (2003) own nerve-stimulation data. Moreover, as the curves depict group means, one can confidently assume that individual variance and irregularity was markedly greater than that shown.

Fig. 5
figure 5

Non-linear, irregular SItw decline with increasing contraction intensity (%MVC), in both the fresh and fatigued biceps brachii muscle. Depicted from TMS data reported by Todd et al. (2003)

Another example comes from Thomas et al. (1989). Eight-stimuli trains of 50 Hz were used to evoke max force immediately before and during 5 min of sustained MVC contractions of the TA and the first dorsal interosseous (FDI) muscles (SItw being the difference between MVC and the evoked max force). Figure 6 illustrates changes in the ratio between MVC and evoked max force values (given as % of their respective control values) along the 5-min fatiguing task, in each of the two muscles. These ratio changes presumably reflect SItw fluctuations with advancing fatigue. There are three observations to note in this representation: a. As with Todd et al.’s data (Fig. 5), the relationships are irregular and undulating; b. There is a distinct slope difference between the two muscles, and c. The curve changes in the two plots mirror each other rather than following a similar trend, possibly suggesting different and contrasting effects on SItw in the two muscles. As with Todd et al.’s (2003) data (Fig. 5), Fig. 6 data (Thomas et al. 1989) are group means. Since the original data, from which the depicted ratios were calculated, had mean variance of ~ 12% (not shown), individual irregularity of this ratio’s behavior must have been considerably larger than shown by the mean curves.

Fig. 6
figure 6

Irregularly changing ratio between MVC and evoked max force (each given as % of its control value) for the tibialis anterior (TA) and the first dorsal interosseous (FDI) muscles in 5 min of sustained MVC contraction. Eight-stimuli trains at 50 Hz were used to evoke true maximal force. Note also the distinct slope difference between the two muscles, and that the respective changes appear to mirror rather than parallel each other. Derived from grouped data reported by Thomas et al. (1989)

Although these results may not be universal and could be representative of only Todd et al. (2003) and Thomas et al.’s (1989) particular cohorts, this is exactly the point to stress. Namely, SItw behavior is hard to predict. Figure 4B,b shows how a mere 1%MVC difference in the final SItw amplitude (from 1%MVC in Fig. 4A to 2%MVC in Fig. 4B) can change central fatigue determination from zero (as in Fig. 4a) to 10% (Fig. 4b), namely, a VAITT decrease from the initial 90% to 80% at the end of the fatiguing exertion). Similarly, a 6% greater decrease in the final Rtw (from 10% MVC in Fig. 4A to 4%MVC in Fig. 4C) changes the determined central fatigue from zero (as in Fig. 4a) to 15% (from 90 to 75%), as depicted in Fig. 4c. Thus, under the conventional methodology, just a 1%MVC deviation of the SItw amplitude that could occur independently of central fatigue (due to peripheral factors such as muscle fatigue or differential potentiation) can lead to erroneous observation of significant central fatigue development.

Unexpected increments and decrements

As just 1%MVC changes in SItw can cause large (e.g., 10%) differences in the determined central fatigue, it is important to note that just as large or larger SItw changes may be had by peripheral factors not normally considered in VA assessment.

Place et al. (2008) applied electrical stimulation to mouse single muscle fibers (i.e., isolated from any neural factors, or other fibres) at different contraction intensities. They observed SItw force to increase from 9%MVC in the fresh muscle to 19% at the end of fatiguing isometric contraction (note that since those were single fibres, the increase could not be assigned to the activation of additional MUs or fibres). The SItw increase was associated with increased intramuscular Ca2+ ion concentration. As contractile force depends on the degree of Ca2+ release in a sigmoidal relationship, the authors interpreted the increased force as being due to the extra Ca2+ released by the interpolated stimulation, causing a transient increase in the evoked SItw, as long as the sigmoidal relationship is in its steep segment. The authors concluded that “the extent of central fatigue is difficult to assess and it may be overestimated when using the twitch interpolation technique”.

Gandevia et al. (2013), on the other hand, showed that, in humans, SItw could actually decrease rather than increase with ensuing fatigue. In line with Herzog’s (2009) criticism of the discrepancy between the biologically occurring firing rates and those typically used in artificial stimulation (e.g., 100 Hz doublets), Gandevia et al. measured force and SItw amplitude changes in 60 s of 30-Hz tetanic adductor pollicis contractions. Although muscle fatigue exceeded 40%, the authors reported SItw to have declined to near zero. By virtue of a faster decline of SItw, compared with Rtw, these data imply that a stimulation paradigm which maintained a constant drive to the muscle (no central reduction or increase), can effectively lead to an increase in VA, despite the development of peripherally induced fatigue. This is yet another demonstration of the inadequacy of the ITT-based assessment of central fatigue. The authors concluded that “unlike when the muscle is fresh, attempts to relate the size of interpolated twitches to the exact level of voluntary activation of the [fatigued] muscle should be made with caution”. Nevertheless, the authors’ findings and conclusion did not cause them to denounce or question the validity of the central-fatigue determination method. Also using the 30-Hz stimulation rate in fatiguing intermittent tetanic contractions in humans, Neyroud et al. (2016) similarly found decreasing SItw with ensuing fatigue and a faster decline in SItw than in Rtw, which would indicate increasing VA with developing muscle fatigue. They concluded that “the extent of central fatigue is difficult to assess and it may be overestimated when using the twitch interpolation technique” and that “muscle fatigue can confound the quantification of central fatigue using the ITT”.

Muscle temperature

Within the normal physiological range encountered in exercising muscles, muscle force and power have repeatedly been shown to increase with increasing muscle temperature above normal resting temperature (e.g., Bergh and Ekblom 1979; Sargeant 1987). Sargent (1987), for example, reported ~ 11% increase, in both maximal knee-extension force and short-term power, when the Quad muscle’s temperature was raised from its 36.6° baseline to 39.3°. Since muscles heat up as a function of contraction intensity and duration, it stands to reason that twitch forces would also be affected, which in turn means that SItw amplitude may increase at the end of fatiguing exercise because of increasing muscle temperature rather than due to decreasing VA. Support for this claim comes from Lloyd et al. (2015) who examined muscle-temperature effect on numerous performance and neuromuscular variables. A criterion task of 120-s sustained knee-extension MVC was performed at baseline Quad temperature of 35.1° and at 38.5°, achieved via hot-water immersion. While endurance was negatively correlated with increasing muscle temperatures, end-of-exercise SItw was highest at 38.5°, ~ 7% higher than at 35.1°. The corresponding increase in Rtw was ~ 10% and VAITT was consequently decreased by ~ 5% (from ~ 56 to ~ 51%). It is thus likely that central fatigue determinations are often exaggerated to some extent due to intramuscular heating during the fatiguing exercise.

Hypoxia

Central fatigue is commonly rationalized as protective mechanism against excessive peripheral fatigue. It is, therefore, reasonable to expect greater central fatigue under a threatening condition such as hypoxia.

Amann et al. (2006), studied performance and neural drive changes throughout a 5-km cycling time-trial under hyperoxia (100%O2), normoxia (21%O2), and hypoxia (15% O2, 17.6 mL/dL arterial O2 content, 81.4% O2 saturation). Neural drive (akin to VA) was estimated via the Quad’s integrated electromyogram. Expectedly, relative to normoxia, mean power output decreased ~ 13% under hypoxia with a similar decline in neural drive. In their conclusion the authors said that “the rate of peripheral locomotor muscle fatigue development—as affected by CaO2 [arterial O2 content]—is a significant determinant of the magnitude of central motor output during exercise to prevent ‘excessive’ development of peripheral muscle fatigue beyond a critical threshold or sensory tolerance limit”.

On the other hand, different findings and conclusions were reached in a recent study by Girard et al. (2021). Well-trained cyclists performed intermittent 15-s efforts at 30% anaerobic power reserve, interspersed with 45-s passive rest, until failure, under normoxic as well as moderate (~ 90% O2 saturation) and severe hypoxic (~ 79% O2 saturation) conditions. The number of repetitions sharply decreased with increasing hypoxic severity, as expected. However, in both brief and sustained MVC tests, performed at baseline and immediately following the intermittent test, a clear tendency was evident for both MVC and VA to increase rather than decline in the severe, compared with the moderate hypoxic condition. This led the authors to conclude that “the magnitude of central fatigue might not be dependent upon hypoxic exposure”. Most poignantly, however, this finding questions the very nature of what is measured by ‘central fatigue’, as well as its claimed protective role against muscle or systemic damage (Amaan et al. 2006, Gandevia 2001).

Muscle type and fibre composition

Part of the observed variability in both Rtw and SItw, and, therefore, also in the determined VA and central fatigue, may be ascribed to differences in the tested muscles and their fibre compositions, independently of any fatigue. Belanger et al. (1983) compared, in both men and women, the plantar-flexors (largely SOL), typically ~ 75–90% type-I fibres (Gollnick et al. 1974; Johnson et al. 1973) with their antagonist dorsi-flexors (TA) of somewhat lower, ~ 70% type-I composition (Henriksson-Larsen et al. 1983; Johnson et al. 1973). Although the muscle-composition difference was small, and possibly not a major factor, TA’s Rtw amplitude of 5.7%MVC was less than half that of SOL’s 12.8% (i.e., MVC-normalized values). The women’s corresponding values were 2.7 and 14.1%MVC, respectively. That is, corrected to the muscles’ MVC differences, SOL’s Rtw was still 2.2 times larger than that of TA in the men, and 5.2 times in the women—a striking inter-sexual difference as well (Fig. 7). The authors further reported a different activation pattern for each of the two muscles but could not say whether they stemmed from differing usage patterns or fibre compositions. However, since the muscles’ compositional differences were not large, it is hard to ascribe to them the entire extent of the large cross-muscle Rtw differences. In a later study, Belanger and McComas (1989) compared the same muscles in children vs. adolescents and obtained similar values. Large inter-muscular differences were also reported by Behm et al. (2002), who tested the Quad as well as the plantar-, dorsi-, and elbow-flexors. They showed AD levels to range widely from only ~ 1.5% in the dorsi-flexors to > 15% in the Quad. Again, such differences cannot be explained by compositional differences alone. It appears, therefore, that factors such as muscle usage and activation patterns may be as or more important—factors which, at least at present, cannot be accurately anticipated.

Fig. 7
figure 7

MVC-normalized resting-twitch amplitudes of the soleus vs. tibialis anterior muscles of men vs. women (Belanger et al. 1983). Note the large discrepancy between the muscles and that it is much greater in the women than in the men

By contrast, however, Bigland-Ritchie et al. (1986), investigated SItw and Rtw in both the SOL and the Quad muscle group (~ 45–55% type-I; Aagaard and Andersen 1998; Johnson et al. 1973). The SOL’s control Rtw was 9.4%MVC, or only 73% of Belanger et al.’s corresponding value (12.8%). The corresponding value for the Quad was threefold higher (28.6%MVC). The fatiguing test consisted of 6-s, 50%MVC contractions with 4-s rest, until target force level could no longer be maintained. At the test’s limit, the single-stimulus SItw could no longer be detected in the Quad and almost disappeared in the SOL. Consequently, the authors concluded that no central fatigue was apparent in the Quad but that some had developed in the SOL. This, however, could have been due to the SOL’s greater endurance and eightfold exercise duration (35.1 vs. 4.4 min), rather than to being inherently more centrally fatigable than the Quad. An Rtw variant (50 Hz 8-pulse train) decreased ~ 60% in the Quad but only 30% in SOL. The data by Thomas et al. (1989), mentioned earlier, also show large inter-muscular response differences (TA vs. FDI). Again, the data could not resolve the question of whether those inter-muscular differences were due to fibre-compositional differences [TA ~ 70% type-I, FDI ~ 50% (Johnson et al. 1973)], or to inter-muscle differences in usage or activation patterns.

Boerio et al. (2005) measured ITT-based VA and central fatigue, as well as M-wave amplitude before and after 13 min of electro-stimulated resistance exercise (no volitional contractions). Both the SOL and medial gastrocnemius muscles (high type-I and balanced composition, respectively) were tested. The authors regarded the post-exertion decline of the maximal M wave as indicating neuromuscular transmission–propagation failure. Observing a more dominant decline in the medial gastrocnemius than in the SOL muscle, the authors concluded that “neuromuscular propagation failure” was “more evident for the muscle with the higher percentage of Type II fibers”. That is, type-II MUs tended to fatigue and consequently display propagation failure, more than their type-I counterparts. Nevertheless, due to the manner by which VAITT and central fatigue are determined, the less fatigable SOL was paradoxically determined as being the more centrally fatigable.

Schillings et al. (2003) looked at muscle fibre conduction velocity (MFCV) in 2-min sustained MVC contractions, examining the influence of peripheral factors on central fatigue determination. Intriguingly, the authors noted a seeming paradox, where MFCV decline, presumably reflecting peripheral type-II MU fatigue (Boerio et al. 2005), was evident primarily during the first half of the contraction and then leveled off, while the force decline, attributed to central fatigue, occurred predominantly in the contraction’s second half.

Differences in how different muscles are activated and their relative fibre-type composition could markedly and unpredictably affect SItw, Rtw, and their interrelationships during fatiguing muscular exertions. Thus, the use of these variables for VA and central fatigue assessments, particularly under fatigue conditions, appears highly unreliable and, at times, likely invalid.

Post-activation potentiation (PAP) and peripheral fatigue

PAP is a peripheral/muscular phenomenon of augmented contractility following one or more preceding contractions. A 10-s MVC is considered the best form of potentiation (Vandervoort et al. 1983), which is most strongly manifested in higher-threshold, type-II MUs (Hamada et al. 2000a; Vandervoort et al. 1983). With ensuing fatigue, the opposite occurs, namely, decline in force generation capacity (Dolmage & Cafarelli 1991; MacIntosh et al. 1994). For instance, Rtw has been reported to decrease during fatiguing muscular exercise (Bigland-Ritchie et al. 1986; Bontemps et al. 2019; Hamada et al. 2003; Ratel et al. 2015; Schillings et al. 2003; Thomas et al. 2015). Again, it is suggested that it is the type-II MUs whose twitch force diminishes most, or completely disappears if and when they cease contracting altogether (Hamada et al. 2003, Nordlund et al. 2004, Vandervoort et al. 1983). Thus, during or following muscular contractions, PAP and peripheral fatigue will have direct effects on the observed amplitudes of both SItw and Rtw. However, the relative effect on these two twitches is likely not similar, being determined by the muscle involved, its composition, and by the level of training (Hamada et al. 2000b; Mettler and Griffin 2012; Pääsuke et al. 2007). For instance, Pääsuke et al. (1998) showed the unpotentiated plantarflexor (SOL) resting twitch of male power athletes to be ~ 17% larger than that of their endurance-trained counterparts. There could have been some difference in muscle composition but, regardless, SOL is of high type-I composition (Gollnick et al. 1974; Johnson et al. 1973). Potentiation increased the power athletes’ twitch by ~ 55% and the endurance athletes’ by ~ 25%, ~ 30% difference (values estimated off a graph). Thus, a major reason for the difference likely was the type of training of each of the two groups. Similar observations were reported in women (Pääsuke et al. 2007), where the corresponding Rtw potentiation values were 51 and 30% for power-trained and endurance-trained women, respectively, although the difference (21%) was somewhat smaller. The corresponding potentiated effect in untrained women was intermediate, 44%.

As discussed earlier [under The Resting (Control) Twitch], the muscle’s entire MU spectrum is presumably activated and reflected in Rtw’s amplitude. SItw, however, being applied at or near 100%MVC intensity, is made up solely or largely of the highest-threshold (type-II) MUs, since all other MUs have already been volitionally activated. Since SItw-activated MUs are unrecruitable volitionally, they cannot be directly potentiated or fatigued by preceding volitional contractions unless residual potentiation or fatigue effect persist from preceding stimulations or contractions, or due to some unknown form of induction by adjacent, volitionally activated MUs. For this reason, it is tempting to expect greater PAP effect on SItw than on Rtw. However, the limited available data suggest the opposite: smaller SItw than Rtw potentiation. For instance, in the Quad, Folland and Williams (2007) found a ~ 44% SItw PAP vs. 53% Rtw potentiation. In the same muscle, Morton et al. (2005), Bigland-Ritchie et al. (1986), and Hamada et al. (2000a) reported still-higher, 63, 64, and 70% Rtw PAP, respectively. In low- vs. high-type-II subgroups Rtw PAP was 43 and 104%, respectively (Hamada et al. 2000a).

The examples above are meant to demonstrate that, due to the compositional differences and the likely different degrees of potentiation and fatigue, Rtw and SItw cannot be presumed to be similarly affected by the two factors. This presumed difference, therefore, would mean varying SItw/Rtw ratios at different contraction intensities and fatigue conditions. As this ratio determines VA calculation and hence also central fatigue, it constitutes yet another example of how peripheral factors can affect the assessment of central fatigue, defined as being determined solely by central factors. The SItw/Rtw ratio is thus rather specific to the person, to the particular muscle, and to that muscle’s composition. Consequently, this makes the SItw/Rtw ratio unpredictable as to how it would be affected in any given individual and fatigue conditions.

In their aforementioned study, Belanger et al. (1983) MVC PAP was similar in men and women. It averaged ~ 145% in the dorsi-flexors, but only ~ 22% in the plantar-flexors (SOL), likely due to the latter’s low type-II composition. In their later study of children and adolescents (Belanger and McComas 1989), the authors reported strikingly similar findings of ~ 145 and 20% MVC PAP, respectively. Investigating twitch PAP, Bigland-Ritchie et al. (1986) found Quad SItw PAP to be 64%, less than half the dorsiflexor MVC PAP in the above Belanger et al.’s studies. The SOL’s SItw PAP, however, was 20% and practically identical to the corresponding MVC PAP values in those studies (Belanger et al. 1983; Belanger and McComas 1989).

Based on the presented data, factors such as muscle type and composition, PAP, and MFCV changes—all peripheral in nature—can affect SItw and Rtw changes throughout a fatiguing exercise in an unpredictable manner that could wrongly be ascribed to central factors. This calls into question the very validity of the tools used for central fatigue assessment. Indeed, based on a simulation model (Contessa and De Luca 2013), involving only peripheral variables (volitional and elicited impulse trains, twitch torques, and time-dependent changes thereof) Contessa et al. (2016) were able to replicate central fatigue findings with no need to rely on any central factors.

Endurance and central fatigue

The well-known attributes of aerobic endurance and endurance training include higher aerobic power, lactate/anaerobic thresholds, and fat metabolism; as well as lower heart rate, glycolytic capacity, lactatemia, and acidosis (see Ratel and Blazevich 2017 for an overview). Endurance training also elevates the EMG threshold (Chwalbinska-Moneta et al. 1998; Hug et al. 2004), and endurance athletes often pre-selectively possess higher type-I/lower type-II muscle fibre composition (Costill et al. 1976; Gollnick et al. 1972). All the above would be expected to facilitate lower muscular and systemic fatigue in both submaximal and extreme exertions, compared with untrained, or power-trained individuals.

In a particularly elegant study, Triscott et al. (2008) used motor-evoked potentials (MEPs) as proxies for the onset and recovery of central fatigue. The authors compared exercise-fatigued biceps to its non-fatigued contra-lateral counterpart in strength- and endurance-trained athletes, as well as untrained controls. The fatiguing task was ~ 13%MVC biceps curls to exhaustion. Changes in MEP and endurance time in the contra-lateral arm were taken to reflect the central fatigue component in isolation from confounding peripheral fatigue in the exercising primary arm. Following the exhaustive task in the primary arm, MEP was depressed, in both arms, in all three groups, suggesting central fatigue development in all of them. However, relative to a control test, endurance times in the contra-lateral arm were reduced only in the non-endurance groups and stayed the same, or slightly increased, in the endurance group. Post-test MEP recovery of the contra-lateral biceps also conformed to expected endurance-training effects and was markedly faster in the endurance group. Since the endurance group’s performance was not compromised, it is difficult to ascribe its MEP decrement to central or any kind of fatigue. While this remains an enigma, the unequivocal message from this study, as could have reasonably been expected, is that endurance training facilitated better coping with central fatigue if not its outright elimination.

As noted earlier, several studies have suggested that central fatigue constitutes some portion of the observed fatigue in prolonged endurance exercise (Martin et al. 2010; Saldanha et al. 2008; Temesi et al. 2014; Thomas et al. 2015; Zghal et al. 2015). However, that kind of fatigue, if indeed true, is not at play when endurance athletes are compared to untrained individuals in few-minute-long fatiguing exertions.

Such a test was at the core of a remarkable recent study by Bontemps et al. (2019), who compared normally active, untrained men (and boys; discussed later) with highly trained endurance athletes in an intermittent knee-extension MVC protocol (5-s MVC every 10 s, carried to 60% of initial MVC). Although endurance capacities were very different (15.9 ± 3.9 vs. 51.7 ± 19.6 repetitions, respectively), VA and central fatigue were not only compared at the respective endurance limits, but also at 15 repetitions (2.5 min), which constituted only 29% of the athletes’ endurance capacity (15/51.7 × 100) and ~ 95% (15/15.9 × 100) capacity of the untrained. Using the ITT-based method, the untrained men, despite being at their endurance limit, incurred minimal central fatigue (i.e., minimal VA decrease) with large Rtw decrement. The endurance athletes, on the other hand, despite still having > 70% of their potential untapped and Rtw that barely decreased, had a steeper VA decline, interpreted as greater incurred central fatigue. (See Fig. 8 for schematics). Due to the large variabilities of VA and its components (discussed earlier), at 15 repetitions these differences amounted only to clear trends which turned statistically significant at the respective endurance limits. Based on both the VAITT and CAR, Boerio et al. (2005) drew similarly paradoxical conclusions in claiming that the high-endurance, high type-I-composed SOL was more centrally fatigable than its relatively low-endurance, low-type-I antagonist TA.

Fig. 8
figure 8

Schematic depiction of the Rtw and VA data of untrained men vs. highly trained endurance athletes after 15 MVC contractions, reported by Bontemps et al. (2019)

Greater fatigue of any kind in highly trained endurance athletes is counter-intuitive, defies physiological expectations as outlined above, and stands in contrast to the expected endurance-training effects as demonstrated by Triscott et al. (2008) via MEP. The paradoxical conclusion could be somewhat more plausible if Bontemps et al.’s athletes and untrained men were only compared at their respective endurance limits, where the > threefold duration difference (> 8 vs. 2.5 min) could have been argued as a reason for the athletes’ higher central fatigue. However, the difference in trend was clearly evident already at the common 15-repetition point. Zghal et al. (2015) partly addressed this issue in conducting an 8-week endurance-training study, testing VA changes from before to after a sustained 15%MVC test to exhaustion. Using the ITT-based method they too claimed greater central fatigue for the trained group. However, as endurance time was more than twice as long for the trained than for the control group (1807 vs. 887 s, respectively), changes were also compared at the groups’ respective pre-training endurance times. The lower VA and greater central fatigue were thus unchanged for the control group, but corresponding changes could no longer be observed in the trained group. So, while some central fatigue could conceivably be ascribed to the longer post-training test duration, an important message here is that endurance training should diminish rather than increase central fatigue, if any.

Thus, Bontemps et al.’s apparent paradox of highly trained endurance athletes being more centrally fatigable than their untrained counterparts in identical-duration exertions, is difficult to reconcile with Gandevia’s (2001) suggestion that central fatigue “will act to ‘protect’ the muscle from further peripheral fatigue” and that it “confers some evolutionary advantage. Perhaps [central] drive is limited because… [it] would put the neuromuscular junction or… excitation–contraction coupling… into a catastrophic state, one from which recovery was delayed or impossible”. Clearly, exercising at 29% of their capacity, the endurance athletes were faced with no impending catastrophe for which they would need extra protection, nor would their recovery be impossible or delayed. By contrast then, why were the untrained men not in need of any such protection when exercising at their very limits?

Are children more centrally fatigable than adults?

Compared with untrained adults and similar to mature endurance athletes, children have long been shown to possess greater endurance and lesser fatigability (e.g., Ratel et al. 2004), higher ventilatory threshold (Klentrou et al. 2006) along with higher EMG threshold (Pitt et al. 2015; Long et al. 2017; Woods et al. 2019, 2020), lower normalized power output, lower lactate response to high-intensity exercise (Dotan et al. 2003), and faster recovery from high-intensity exercise (Falk and Dotan 2006). Due to their smaller dimensions, children’s muscles are also advantaged by better blood supply due to their shorter diffusion distances (e.g., Mathieu-Costello 1993). Furthermore, there is evidence to suggest that children’s muscles have higher composition of the more oxidative, less fatigable, type-I muscle fibres (Jansson 1996; Lexell et al. 1992) and that children are less capable of activating their fast-twitch, glycolytic, and more fatigable type-II motor-units than adults (Dotan et al. 2012; Long et al. 2017; Pitt et al. 2015; Woods et al. 2019, 2020). For these reasons and others, children’s physiological responses to exercise have been likened to those of well-trained adult endurance athletes (Pitt et al. 2015; Ratel and Blazevich 2017). Thus, as with the endurance athletes discussed above, it is counter-intuitive that five of the six studies that compared central fatigue in children vs. adults, have found children to be more centrally fatigable than adults (Bontemps et al. 2019; Piponnier et al. 2018, 2019; Ratel et al. 2015, Streckis et al. 2007).

Bontemps et al. (2019), who included prepubertal boys in their aforementioned ‘endurance athletes’ study, found the boys’ Rtw and their calculated VA and central fatigue to be more comparable to those of the endurance athletes than those of untrained men. Congruent with children’s aforementioned similarities to endurance athletes, the pattern shown in Bontemps et al.’s endurance athletes (stable Rtw, declining VA; Fig. 8) has also been demonstrated in children (Bontemps et al. 2019; Piponnier et al. 2018, 2019; Ratel et al. 2015). However, in this only comparison to endurance athletes we are aware of (Bontemps et al. 2019), the boys’ VA decrease and calculated central fatigue increase were roughly twice those of the athletes.

Being untrained, the boys’ similarity to endurance athletes is likely attributable mostly to their lower VA and greater AD (O’Brien et al. 2010), suggested to stem from their relative inability to activate their higher-threshold, type-II MUs (Dotan et al. 2012; Long et al. 2017; Pitt et al. 2015; Woods et al. 2019, 2020), as well as to their greater proportion of type-I fibres, compared with untrained adults (Jansson 1996; Lexell et al. 1992). Taken together, these factors put the boys’ functional muscle composition roughly on par with that of the best endurance athletes. The boys’ greater endurance capacity is evidenced by their 2.5-fold more repetitions to exhaustion, compared with the untrained adults (40.4/15.9). This explains why, like the athletes, the boys’ Rtw demonstrated very little fatigue and decreased minimally.

Still, despite their somewhat lower endurance capacity, relative to the trained athletes (40.4 vs. 51.7 repetitions, respectively), the boys’ Rtw decreased more slowly, and their derived VA decrease and central fatigue increase were nearly twice the athletes’ corresponding values. Being peripheral phenomena, Rtw amplitude and changes thereof cannot be explained by central factors and the reason for the observed Rtw differences may involve PAP. Children’s PAP is typically lower than of untrained adults, apparently due to their lower functional type-II composition. Accordingly, in a muscle of extremely low type-II content (SOL), Pääsuke et al. (2000) found no PAP difference between boys and men. However, when the Quad, with its much higher type-II composition, was involved as the prime mover, squat jump height and peak rate of torque development were significantly increased by PAP in both men and women, but not in prepubertal boys or girls (Arabatzi et al. 2014). We have corroborated children’s markedly lower Quad PAP in yet-unpublished findings from our lab. The athletes’ greater PAP, compared with the boys, could be due not only to the athletes being mature adults, but also, as mentioned earlier, to the training effect on PAP (Hamada et al. 2000b; Mettler & Griffin 2012). Thus, relative to the boys, PAP adds more to the athletes’ Rtw size that is already larger due to them being adults. It may thus be possible to suggest that the limited peripheral fatigue, encountered by the 15th contraction, affected the athletes’ Rtw more than it did that of the boys.

The dependency of calculated central fatigue on the functional type-II fibre content (via greater peripheral fatigue and PAP) is inversely illustrated in the SOL muscle, characterized by high type-I fibre composition in both children and adults. Piponnier et al. showed the derived central fatigue to be much smaller in SOL than in the Quad (Piponnier et al. 2018) and, for apparently the same reason, Hatzikotoulas et al. (2014), who tested only SOL, were the only researchers not to find child–adult central fatigue differences.

With all the above in mind, it is difficult to endorse the rationalization that “the greater central fatigue observed in boys could reflect a strategy of the CNS aimed at limiting the recruitment of motor units to prevent any extensive peripheral fatigue” (Ratel et al. 2015), or that contrary to adults, children’s “central nervous system could not tolerate the development of an extensive peripheral fatigue” (Piponnier et al. 2019). Why should children need more protection against peripheral conditions that are much less stressful to them even under the most extreme exertions?

Conclusions

For ITT-based central fatigue assessment to be valid, changes in VAITT during fatiguing muscular exertions, must be affected largely, if not solely, by central fatigue. In other words, the validity of the prevailing central fatigue assessment method hangs on the premise that VAITT is as valid in fatigued muscles as it is in fresh ones. We have provided various lines of evidence to suggest that the VAITT variables behave differently under fatigue and are prone to and markedly affected by factors that are either independent of fatigue or are entirely peripheral. For a conceptual summary of this review see Fig. 9.

Fig. 9
figure 9

Conceptual review summary. Graphically depicted are the confounding factors impinging on the accuracy and reliability of ITT-based VA determination in the fresh muscle and additionally under muscle fatigue. These directly compromise central fatigue determination reliability, which in light of a conceptual paradox and observed discrepancies, question the very validity of the central fatigue determination method

The incapacity of the ITT-based method to properly gauge central fatigue is hard to appreciate when results are scrutinized in a non-comparative context. However, it is when untrained adults are compared with children or endurance athletes and found seemingly less centrally fatigable, that the inexplicable discrepancy embodied in the central-fatigue determination method becomes evident.

It remains to be elucidated whether central fatigue exists in all fatiguing muscular contractions, in what situations, and to what extent. The prevailing ITT-based determination method identifies central fatigue in nearly all types and durations of peripherally fatiguing muscular exertions. The presented evidence renders as questionable both the identification and the extent of the determined central fatigue. On their own, these two aspects of central fatigue assessment could have been regarded as merely reflecting low sensitivity and accuracy of a conceptually sound method. However, the paradoxical determination of higher central fatigability of both children and endurance athletes questions the very foundation on which the method rests. Regrettably, there is presently no way by which to unequivocally prove or refute the existence or extent of central fatigue in different fatiguing contractions and it will continue to be a speculative topic for the foreseeable future.