FormalPara Key Points

Current research is equivocal regarding the use of heavy or light loads for optimal strength and hypertrophic adaptations.

Misinterpretation of electromyography amplitude, differing hypertrophic assessment methods (e.g. in vivo and in vitro) and unconsidered motor schema research might present reasons behind the differing adaptations reported.

1 Introduction

The role of load within resistance training is presently a hotly discussed topic in exercise science. Recent reviews have examined existing studies comparing the effects of different loads on muscle function (e.g. strength and endurance) and hypertrophy. In these reviews, some authors have suggested that essentially the same adaptations are possible with both heavy loads (HLs) or light loads (LLs) when resistance training is continued to momentary failure [1, 2]. In contrast, others suggest that inclusion of specifically LLs or HLs may be necessary for optimising certain adaptations [36]. We propose that ‘heavy’ and ‘light’ loading systems exist on a spectrum and are individual based on subjectivity; however, for clarity, HL and LL have been operationally defined as >65 % 1 repetition maximum (RM) and <60 % 1 RM, respectively [6]. A number of recent studies have been published, with some examining acute mechanistic differences resulting from difference in load and others comparing chronic changes in muscle function and hypertrophy. Unfortunately, we believe that some researchers may have inappropriately interpreted the data produced in these studies, with much of this attributable to incorrect inferences regarding motor unit (MU) recruitment in acute studies of electromyography (EMG) as well as different methods of measuring both muscle function and hypertrophy. With this in mind, in the present piece we aim to discuss why different exercise scientists might have given contrasting recommendations by discussing the factors that should be considered in interpretation of research in this area.

2 Acute Electromyography Amplitude and the Size Principle

It is commonly accepted in the resistance training literature that recruitment of a MU is necessary in order for subsequent adaptation to occur [7]. Since discussions around optimal load for muscular adaptations are predicated on the belief that complete recruitment of MUs and thus muscle fibres is required for optimal adaptations, it is essential to consider acute EMG research within this area as well as briefly reconsider the size principle of MU recruitment. Recent studies have reported higher peak EMG amplitude for HLs than for LLs [7, 8], with one recent study showing increasing EMG amplitudes from 50 to 70 % and to 90 % 1RM [9]. From this, the authors of these studies have inferred that LLs do not maximally recruit all MU and as such HLs are favourable for development of strength and hypertrophy. However, such recommendations may be founded on an incorrect use and interpretation of EMG data relating to MU recruitment as well as a misapplication of the size principle.

For clarity, the size principle states that “when the central nervous system recruits motor units for a specific activity it begins with the smallest, more easily excited, least powerful motor units and progresses to the larger, more difficult to excite, most powerful motor units to maintain or increase force” [10, 11]. However, as noted recently by Enoka and Duchateau [12], whilst EMG amplitude is influenced by MU recruitment strategies, many continue to mistakenly infer MU recruitment from amplitude data. For example, during a maximal voluntary contraction, more MUs, including both those of a low or high threshold, will be activated and at increased frequencies in order to produce maximal force. As such, the high MU recruitment would result in a higher EMG amplitude. In comparison, a sustained submaximal contraction would only recruit sufficient MUs to produce the necessary force; however, as those MUs fatigue, other MUs would be recruited to replace them in sustaining the desired force. Indeed, during fatiguing contractions the threshold for recruitment of higher-threshold MU is reduced, permitting their subsequent recruitment [13], and MUs may ‘cycle’ (momentary de-recruitment and recruitment of different MUs) during submaximal fatiguing contractions to reduce fatigue and maintain force [14]. Furthermore, the ‘muscle wisdom hypothesis’ suggests that during sustained contractions the MU discharge rate might decrease due to optimising the force output of MUs and protecting against peripheral conduction failure [15, 16]. Should this decrease in discharge rate occur, there would be a resultant decrease in signal amplitude [17]. As such, whilst HLs would require more synchronous MU recruitment at greater frequencies (resulting in higher EMG amplitudes), sustained contractions to muscular failure with LLs might ultimately recruit all Mus, albeit sequentially (resulting in lower EMG amplitudes) rather than synchronously.

It should be noted that whether MU recruitment is ultimately similar between HLs and LLs remains a hypothesis that needs to be tested empirically. Examination of this would require more advanced handling of EMG data such as spike-triggered averaging [18] or initial wavelet analysis followed by principal component classification of major frequency properties and optimisation to tune wavelets to these frequencies [19]. Though acute mechanistic data cannot be used to infer chronic adaptations, studies such as these recent EMG amplitude comparisons of HLs and LLs are useful for generating hypotheses for examination in chronic training interventions. However, the hypotheses presented by the authors of these recent studies suggesting that HLs may produce greater adaptations appear to stem from inappropriate interpretation of EMG amplitudes and consideration of the size principle.

3 Hypertrophic Adaptations

Common methods of measuring hypertrophy are in vivo (e.g. computed tomography [CT], magnetic resonance imaging [MRI] and ultrasound) and in vitro (e.g. muscle biopsy). Recent reviews have differed in their inclusion of studies using these methods with some opting to examine only in vivo measures of whole-muscle hypertrophy [1] and others considering both in vivo and in vitro measures [5, 6]. In fact, methods used to measure hypertrophy, the information they can provide, and the strengths and weaknesses of both have been discussed in light of these publications [20, 21]. We acknowledge that whilst both in vivo and in vitro methods present useful information, both offer very different information and the two should be interpreted individually and carefully.

In both a recent review [5] and meta-analysis [6] of hypertrophy in response to HLs and LLs, resistance training studies utilising both muscle biopsy and in vivo methods were considered, and in the meta-analysis were combined for analysis. However, the combination of in vivo and in vitro measures in this meta-analysis might have confounded the overall conclusions drawn in relation to other publications [1]. In support of this concern, a study by Mitchell et al. [22] that was included in the meta-analysis conducted both MRI and biopsy measures of hypertrophy in response to different resistance training loads and reported that relative increases appear to be greater for biopsy measures (mean = ~17–30 % type I and ~16–18 % type II; favouring LL and HL conditions, respectively, in terms of effect size [ES]) than for MRI (~7 %; favouring the HL condition in terms of ES). McCall et al. [23] have also reported differences between muscle biopsy and MRI methods in magnitude of mean cross-sectional area (CSA) increase (biopsy = 10 % type I fibre and 17.1 % type II fibre vs. 12.6 % from MRI). It is not clear from the meta-analysis method section how the authors dealt with the inclusion of the different outcome measures for hypertrophy used by Mitchell et al. [22], i.e. whether they were dealt with separately or combined. Indeed, it has been noted [20] that in the earlier review [5] those studies using in vivo measures of whole-muscle hypertrophy consistently showed no difference between HLs and LLs, whereas the two in vitro studies using biopsies did show significantly greater gains for HLs. Whilst ultimately still not statistically significant (p = 0.076), the degree to which the combination of methods influenced the results of this meta-analysis in favour of greater ESs for HLs compared with LLs it is unclear (mean ± standard deviation [SD] LL = 0.39 ± 0.17; HL = 0.82 ± 0.17). In the aforementioned meta-analysis by Schoenfeld et al. [6], a forest plot of the ESs showed the impact of load on hypertrophy; this has been adapted and included as Fig. 1. When compared with the overall ES for all studies evaluated, it is noteworthy that studies with a higher ES than the overall value (e.g. right of the broken line in Fig. 1; Campos et al. [24] and Schuenke et al. [25]) used in vitro methods of measuring hypertrophy, whereas studies with a lower ES than the overall value (e.g. left of the broken line in Fig. 1; Mitchell et al. [22], Ogasawara et al. [26], Popov et al. [27], Tanimoto and Ishii [28], Tanimoto et al. [29], Van Roie et al. [30]) used in vivo methods of measuring hypertrophy. This suggests that combining these methods of measurement might have contaminated the analyses and overall outcome.

Fig. 1
figure 1

Adapted from Schoenfeld et al. [6], with permission

Forest plot showing the impact of load on hypertrophy by study. The broken red line represents the overall effect size. Studies to the right of the broken red line used in vitro methods to measure hypertrophy, whereas studies to the left of this line used in vivo methods. Plotted values represent mean muscle hypertrophy effect size difference between high- and low-load groups ± confidence interval

The use of in vitro measures such as muscle biopsy permits the examination of many important aspects of muscular adaptation, including individual fibre typing, individual fibre area, mitochondrial content, enzyme expression and capillarisation. Indeed, it has been suggested that fibre-type specific adaptations may occur in response to HL or LL training [31] and, though evidence is mixed at present as to whether this indeed occurs [22, 32, 33], biopsy would be necessary to test this hypothesis further. Pertinent to hypertrophy as an outcome, it has been argued that a case could be made for biopsy providing the most relevant information. This is because individual fibre area can be determined, thus allowing differentiation between contractile and non-contractile components [5]. However, it should be noted that evidence is equivocal regarding the agreement between whole-muscle CSA changes and biopsy-determined changes in myofibril CSA, with some studies suggesting a similar magnitude of relative change [34, 35] whereas others do not [36, 37]. In fact, authors have actually agreed that “it might be true…that single fiber CSA data over-estimate whole muscle CSA” [38, 39]. Methods exist to ensure that sufficient tissue samples are obtained for analysis using biopsy, yet only a limited number of cells are assessed irrespective of method. In this sense, variation in fibre characteristics and non-uniform growth along the length of a muscle [40] provide notable limitations in attempting to extrapolate biopsy results to consider whole-muscle change [41]. However, measuring muscular adaptation using in vivo methods is not without issues: different methods (MRI, CT, ultrasound) can offer different information for both individual and whole muscle groups, including CSA, muscle thickness, muscle density, architectural changes such as pennation angle and changes in non-contractile components such as intra-muscular adipose tissue. Again pertinent to the outcome of hypertrophy, even consideration of whole-muscle changes in CSA or muscle thickness may not be fully reflective of morphological adaptation. CSA may also include non-contractile components and so increases may not entirely reflect muscular adaptations. Further, and conversely, prior studies have reported a lack of change in CSA yet significant increases in muscular density [42] in addition to disproportionate strength and CSA gains possibly being influenced by changes in muscle density [43].

In our opinion, the confounding factors discussed limit the integrity of any outcome data where analyses have combined these methods of measurement of hypertrophy. Furthermore, from a practical perspective, different outcomes may hold different value for persons with different goals. For example, those with aesthetic goals may have greater interest in whole-muscle changes irrespective of whether changes occur as a result of contractile or non-contractile components increasing, whereas those with more performance-specific goals may have greater interest in fibre-specific adaptations or changes in muscle density. As such, we believe that the different outcome methods, though both providing important information, ultimately provide different information and should be considered as such in interpretation.

4 Muscle Function Adaptations

Muscle function is often measured as either strength, relative endurance (repetitions performed at a submaximal percentage 1RM load) or absolute endurance (repetitions performed with an absolute submaximal load). The nature of testing mode for these can vary considerably, including free weights, resistance machines, and isokinetic or isometric dynamometers. Publications from the American College of Sports Medicine (ACSM) have suggested that HLs promote greater strength adaptations, whereas LLs may promote greater endurance adaptations (though it is not specified whether they refer to relative or absolute endurance) [3, 4]. However, these claims have received criticism [44, 45] and authors of more recent reviews have reported similar increases in strength and absolute endurance adaptation irrespective of training load [2, 10, 46]. The similar changes in strength and absolute endurance have been suggested as possibly due to the inherent relationship between the two outcomes [47, 48]. With this in mind, it is important to consider the nature of the measures of muscular function employed in studies considering HL and LL training.

The recent meta-analysis by Schoenfeld et al. [6] referred to in Sect. 3 also examined a muscle function outcome (strength), again reporting no significant difference between HLs and LLs but a greater ES in the HL condition (mean ± SD LL = 1.23 ± 0.43; HL = 2.30 ± 0.43). However, again some studies have utilised differing methods of measuring muscle function within their designs. For example, Mitchell et al. [22] reported a number of different muscle function-related outcomes, including strength (1RM and isometric maximal voluntary contractions) and relative endurance (repetitions to failure with both 30 and 80 % 1RM loads in addition to total work). These varied with regards to whether changes significantly favoured the HL group (1RM and total work with 80 % 1RM) or the LL group (number of repetitions with 30 % 1RM). The authors of a more recent publication reported significantly greater strength adaptations for the back squat but not bench press: 1RM when using 70–80 % 1RM compared with 30–50 % 1RM (although larger ESs for bench press were noted for the HL group) [49]. Further, changes in relative endurance (repetitions to failure using 50 % 1RM) were significantly greater for the LL group. Interestingly, there were no significant between-group (HL vs. LL) differences for hypertrophy of the elbow flexors, extensors and quadriceps muscles. In contrast, the same group of authors reported significantly greater increases in 1RM for bench press, but not back squat, when training with 3RM compared with 10RM [50]. Another paper included in the meta-analysis by Ogasawara et al. [26] found no difference in elbow extension isokinetic strength between HL and LL groups but did find a difference for bench press 1RM. As with studies included in the hypertrophy component of this meta-analysis, it is not clear how different outcomes were handled for these studies [22, 26] and, for reasons described below, this may have similarly impacted the ESs in favour of HL conditions.

It is interesting to consider the reasons for the divergent results within these studies and to consider the testing modes employed. We propose that one reason as to why there might be differing strength and hypertrophic adaptations might be that of skill specificity in motor recruitment [51]. Motor control research suggests that a motor schema is highly specific to the task being practised [52, 53], and though it could be argued that the higher number of repetitions associated with LL training could suggest a greater volume of practice favouring those conditions, motor schemata have also been reported to be load-/force-specific [54]. With this in mind, lifting a heavier load in a particular movement might serve to practise and refine that schema as a skill, which would include the maximal synchronous recruitment of MUs and muscle fibres. This is a key reason why most maximal testing protocols include some sort of familiarisation or practice component within exercise science research [55]. Indeed the results of Mitchell et al. [22] support this contention: though the HL group had a greater increase in 1RM, possibly due to the motor schema refinement that likely occurred from training closer to their maximal load, there were no differences between the HL and LL groups for peak isometric maximal voluntary contraction, maximal power output or rate of force development. The tendency for greater strength gains in the HL groups in the studies by Schoenfeld et al. [49, 50] may also be due to this specificity of motor schema refinement. Further, the 1RM tasks measured were compound free-weight movements (squat and bench press) which have been shown to require multiple (~3 to 5) familiarisation sessions even in moderately trained persons due to continued increases in 1RM [56], and improvements during these tasks are likely attributable to neural and learning effects [57]. In support of this are the results from Ogasawara et al. [26] who reported significantly greater gains in bench press 1RM for the HL group but found no differences between groups for elbow extension strength. Thus, in the studies mentioned the apparent superiority of HLs in enhancing strength may simply reflect better learning of the specific skills involved in the testing. In contrast, more simple strength tasks such as dynamometry of isolated joint movements require less refinement of motor schemata evidenced by the requirement for only a single familiarisation session to achieve reliable results [58, 59]. However, that single familiarisation session is still essential to achieve valid results, and therefore even with such simple tasks there is clearly a skill learning element to testing results. In our opinion, researchers should therefore bear the specificity principle in mind when comparing the results of different training protocols, as the similarity of training and testing protocols is likely a key factor.

5 Exertion and Discomfort

We also speculate that a secondary reason for the differing results in these studies [22, 49, 50], particularly with respect to the changes in relative endurance, may relate to exertion and associated discomfort. The differentiation between perceptions of effort and discomfort have been highlighted recently as important [60], particularly within resistance training [61], for good reason.

Shimano et al. [62] considered rating of perceived exertion (RPE) values in trained and untrained persons performing a single set to momentary failure at 60, 80 and 90 % 1RM for back squat, bench press and arm curl. The authors reported no significant differences in RPE between load and exercise performed, with the exception of a significantly higher exertion for the back squat at 60 % 1RM in trained persons (mean ± SD 8.8 ± 0.7 vs. 6.9 ± 1.9). This might suggest that the volume of repetitions preceding momentary failure may have produced a greater degree of discomfort resulting in a higher RPE value. Indeed, further research has shown that when performing multiple sets to momentary failure, mean (±SD) RPE increases significantly from set one (50 % 1RM = 7.40 ± 1.96 vs. 70 % 1RM = 7.73 ± 1.44) to set two (50 % 1RM = 8.60 ± 0.99 vs. 70 % 1RM = 8.73 ± 0.80) to set three (50 % 1RM = 9.33 ± 0.82 vs. 70 % 1RM = 9.47 ± 0.74) with no difference between different loads [63]. We have quite specifically termed this discomfort rather than exertion for the following reason. The authors of these studies reported that participants exercised to momentary failure with verbal encouragement to ensure adequate motivation and effort, and RPE was measured using a Borg CR10 scale [64], where a value of 10 indicates maximal effort. In this case, each trial, irrespective of exercise, load or training status should have resulted in a maximal value for effort since participants were exercising to momentary failure. However, as participants did not report maximal values we can only assume that the participants were unclear how to report their perception of effort and, as such, potentially expressed their feelings of discomfort. Again, despite also using the Borg CR10 RPE scale and having participants train to momentary failure, Pritchett et al. [65] also reported RPE values of less than 10 for both acute and session RPE. However, RPE was significantly higher for the 60 % 1RM condition compared with 90 % 1RM, suggesting the LL with a higher number of repetitions incurred higher discomfort than training at a HL. Based on this, we hypothesise that people might find it more difficult to reach momentary failure with a LL because of higher discomfort. As such, studies comparing HL and LL training where participants are said to have trained to momentary failure might be limited by high discomfort in the LL group, preventing participants from reaching true momentary failure. We propose that in comparisons of HL and LL groups the conduct of reaching momentary failure becomes all the more important in a LL group to maximally, sequentially recruit all possible MUs. However, we should acknowledge that at present there are insufficient studies comparing LL training to momentary failure and not to momentary failure to determine how much of a meaningful difference a final repetition (e.g. reaching ‘true’ momentary failure) might make towards chronic adaptations.

6 Conclusion

When considering the findings of studies comparing the effects of HLs and LLs, there are a number of important factors to consider. These include the different outcomes related to morphological changes providing differing information, skill associated with the testing mode chosen (both load and task), and other psychosocial factors such as discomfort. We contend that different testing modes evidently reflect different outcomes and, indeed, they may hold different values for persons with different goals. Again, it is possible that HLs or LLs may favour certain outcomes and not impact on others. For example, if solely wishing to improve maximal strength of a specific task (such as a powerlifter wishing to improve back squat, deadlift or bench press) a recommendation might be to perform these specific exercises using HLs to attempt to catalyse both morphological and neural adaptations [50]. In contrast, those more interested in improving muscular force production for health parameters or in a way that might be widely transferable may be able to utilise a variety of loading schemes [22].

We hope that the present piece has catalysed a more open mindset toward some of the factors that must be considered with regards to interpretation of studies examining HLs and LLs in resistance training. The discussion of resistance training load is pertinent since most strength coaches first consider maximal strength testing in order to then make training recommendations based on percentage 1RM. The purpose of this piece is not necessarily to challenge others’ recommendations regarding this topic; rather, we hope to provide practitioners with the necessary understanding to interpret presently existing research on the topic and recommendations surrounding it that may on the surface seem to be contradictory. The impact of load in resistance training may produce differential adaptations in different aspects of morphology or function. Thus, persons should first consider their desired training goals and then decide whether evidence would appear to suggest that the manipulation of load might impact those goals differentially. If the effect of load is presently equivocal for a particular outcome, there are potentially numerous practical implications of being able to self-select an external load. These include reducing the need for specific facility memberships (e.g. where specifically HLs are available), motivating older persons or those who might be less confident using HLs, and allowing people to undertake home- or field-based resistance training intervention strategies. Ultimately these might serve to improve exercise adherence. As a final caveat to the content discussed, we recognise that there is very likely a threshold load (below which continued recruitment would not be produced because of the recovery capacity of utilised MUs and muscle fibres, and thus preventing true momentary failure from ever being reached) that, if not exceeded, might produce suboptimal adaptations. However, this has not been identified empirically in any literature and is likely very individual, and possibly based on individual mechanics and muscle fibre type.