FormalPara Key Points

In exercise science there are numerous definitions and measurement instruments of perceived effort. This leads to confusion and hinders measurement validity.

To solve this problem, there is a need to clarify and narrow the number of definitions and terms included within the perceived effort construct.

There is a need to measure other perceptions and emotions more frequently to improve the field’s precision and comprehensiveness.

1 Introduction

Ratings of perceived effort (RPE) are the most used single-item scales in exercise and sport sciences.Footnote 1 RPE scales have a range of benefits, including relationships with physiological and performance measures within and between training sessions [1,2,3,4,5]. Hence, RPE scales can serve as a surrogate measure of physiological indices, and assist in quantifying exercise intensity and load. RPE scales can also be used to regulate exercise intensity [6,7,8,9]. By pre-setting specific RPE targets, and instructing people to remain within their boundaries during exercise, people can effectively reach or avoid certain intensities. These benefits of RPE scales persist across different populations and exercise modalities [5, 6, 8, 9].

While the advantages of RPE scales are clear, they suffer from certain methodological limitations, most of which stem from multiple definitions of perceived effort (PE). Indeed, debates about PE definitions and its neurophysiological underpinning are still ongoing [10,11,12,13]. The literature is flooded with definitions that fall under the same umbrella of PE, although they may be addressing different perceptions [10, 11, 13,14,15,16]. Moreover, a growing number of RPE scales and instructions are being used by different laboratories [2, 4, 17,18,19]. RPE scales are also applied to rate effort in specific body parts or in the body as a whole [20,21,22,23]. The number of interactions between definitions, scales, instructions and administration strategies, all attempting to capture the same perception, are exceptionally large. This poses a threat to measurement validity of RPE scales. By measurement validity we refer to the degree to which the observed scores (RPE) meaningfully capture the ideas contained in the corresponding concepts (PE construct) [24].

Here we build and expand on recent RPE articles [10, 12, 15], and cover (1) common PE definitions, (2) how the terms included within these definitions can hinder the scales’ validity, (3) the problems that can arise from using different effort scales and instructions, and (4) possible shortcomings of measuring PE in specific body parts and in the whole body. We conclude with recommendations on how to improve PE measurements, and discuss the benefits of incorporating measurements of other constructs more often.

2 Definitions

We present two common definitions of PE.Footnote 2 The differences between them partly stem from different perspectives on PE’s neurophysiological underpinnings. Since this article focuses on methodological aspects of PE measurement, we only briefly discuss these mechanistic underpinnings as they pertain to PE definitions. We refer the reader to other articles covering the neurophysiological basis of PE [10, 12, 26].

A prominent figure in the RPE literature is Gunnar Borg. Borg developed several scales, including the 6–20 RPE Scale [27], the Borg category-ratio (CR) 10 Scale [28], and Borg centiMax Scale (CR100) [29]. In his book, Borg [30] viewed effort as physiological and psychological signals integrated into a single “Gestalt” perception during physical activity (p. 3). Borg defined PE as “… the feeling of how heavy and strenuous a physical task is” (p. 8). The following is also presented to subjects when instructed on how to rate the scale, “The perception of exertion depends mainly on the strain and fatigue in your muscles and on your feeling of breathlessness or aches in the chest” (p. 47). Other figures in the RPE field, Noble and Robertson [31, 32], followed Borg’s general view of PE and defined it as “The subjective intensity of effort, strain, discomfort and/or fatigue that is felt during exercise”. To date, the work by Borg, Nobel and Robertson on PE has been cited thousands of times. Other notable researchers have embraced the viewpoint that an integration of inputs from several bodily systems produces PE [1, 13, 33, 34].

A different perspective of PE was put forth by Marcora [11, 35]. According to Marcora, PE is solely a product of corollary discharge occurring in the brain. This perspective leans on the theory of corollary discharge, which postulates that when the motor cortex delivers signals to activate muscles, copies of these signals are simultaneously delivered to the sensory cortex. These copies are then interpreted as PE [36, 37]. In contrast to the perspective advocated by Borg and others, Marcora views PE to be independent of afferent feedback. For example, heart transplantation patients reported normal RPE values despite their denervated organ [38]. Marcora defines PE as the “Conscious sensation of how hard, heavy, and strenuous a physical task is” [35]. While this definition resembles Borg’s, it is important to note that the terms fatigue and discomfort are absent from it, and from the subsequent instructions of the scale. This is because they represent different constructs that have distinct neurophysiological pathways [11]. Marcora and colleagues use Borg’s RPE or CR-10 scale, and instruct participants to report how hard they are driving their working limbs [39], how heavily they are breathing, and the overall sensation of how strenuous the exercise is [40]. The precise instructions are modified to fit the exercise modality and research question.

Abbiss and colleagues [10] recently proposed a distinction between effort and exertion, two terms that are commonly used synonymously. They defined effort as the “The amount of mental or physical energy being given to a task”. This emphasizes effort as a process in which people invest a certain resource. Exertion was defined using Borg’s definition “The feeling of how heavy and strenuous a physical task is”. This emphasizes exertion as a process associated with the physical and physiological stress induced by the exercise. In a sense, this distinction bridges the two viewpoints: effort is more aligned with Marcora’s and the corollary discharge perspective, whereas exertion is more aligned with Borg’s and the afferent feedback perspective. The requirement to narrow the definition of effort/exertion and to differentiate between perceptual experiences is generally agreed upon [12, 41]. However, it is debatable if effort and exertion should be distinguished precisely, because the terms are used synonymously. This fact could lead to the opposite intended effect and increase confusion in the RPE field. We offer possible solutions to these issues in Sect. 6.

3 Differentiating Perceptions

The perspective adopted by Borg and others on the underpinnings of PE likely led to the inclusion of the following terms within its definition: fatigue, discomfort, and heaviness. This is problematic, as there is evidence that people can differentiate PE from such perceptions [12]. We now turn to discuss these terms in relation to PE. Importantly, we only discuss these three constructs, because they are included in Borg’s or Marcora’s definitions of PE. From a broader perspective, other constructs such as exercise induced pain, enjoyment, or even hunger can be of interest to the study of PE. However, such constructs require a thorough discussion in and of themselves and are beyond the scope of the present review.

3.1 Fatigue

Despite that some PE definitions include the term fatigue [2, 31], Borg [42] wrote that “the concept of fatigue should be distinguished from the concept of perceived exertion even though these two concepts have very much in common”. Recently, in line with Borg’s distinction, Micklewright et al. [43] developed a new single-item scale called rating of fatigue (ROF). The authors defined perceived fatigue as “… a feeling of diminishing capacity to cope with physical or mental stressors, either imagined or real”. ROF and Borg’s RPE were strongly correlated during an incremental physical task. Yet, people rated RPE as zero at the point of exercise termination, whereas ROF was still relatively high. This finding highlights the discriminant validity of fatigue during recovery. The strong relationship between the two scales (r = 0.99) could partly stem from the fact that Borg’s RPE instructions include the term fatigue. It is possible that when required to rate the two scales during exercise, participants rate them in a similar manner.

During incremental physical activities, fatigue and effort are expected to strongly correlate as both tend to increase simultaneously. This is because once muscular fatigue accumulates, it becomes more difficult to produce similar levels of force, and producing such force will require greater effort [44]. In contrast, short, maximal effort physical tasks completed when people are fresh, will likely lead to different perceptions. To illustrate, a single, three-second elbow flexion maximal isometric contraction should not lead to a meaningful perception of fatigue, but to maximal PE [45]. Also, in cases in which exercise difficulty decreases over time, different and even inverse relationships between perceptions may occur. For example, perception of fatigue may increase, while PE decreases. This proposal is supported by studies observing different RPE values in protocols, where exercise difficulty increased or decreased over time, although completing similar work in both conditions [46, 47]. Hence, exercise configuration can interact with the ratings of PE and fatigue.

3.2 Discomfort

With proper instructions, people can discriminate between PE and discomfort during exercise [48, 49]. To further establish this division, Steele and colleagues recently developed modified CR-10 Borg scales that measure effortFootnote 3 and discomfort [19]. Discomfort was defined as “the physiological and unpleasant sensations associated with exercise” [19]. The authors observed that participants can differentiate between effort and discomfort, and that exercise duration has a strong impact on perception of discomfort [50, 51]. A longer set, composed of more repetitions and/or time of muscles being under tension, led to higher perception of discomfort compared to effort, when the sets were taken to momentary failure [50, 51]. To illustrate, Stuart et al. [51] had participants complete dynamic back extensions to momentary failure while lifting either lighter (50%) or heavier (80%) loads that were calculated based on their maximal voluntary contractions. Whereas both conditions led to maximal RPE values, perception of discomfort did not reach maximal values in either condition, and was greater in the low force condition.

3.3 Heavy

Both Borg and Marcora include the term heavy in their definition of effort. This term is problematic, especially as it pertains to resistance training [52, 53]. Lifting a heavier load, absent of information on the number of repetitions completed relative to the repetition maximum (RM), is not necessarily indicative of effort [52, 53]. For example, it can be argued that completing one repetition of a 5RM load (1/5 RM = 20% of maximum) is easier compared to completing nine repetitions of a lighter 10RM load (9/10 RM = 90% of maximum). When completing an exercise to momentary failure, lifting lighter loads usually leads to higher RPE values compared to heavier loads [54, 55].Footnote 4 Hence, the total number of repetitions completed relative to maximum is a more comprehensive way to estimate effort. The problem with including the term “heavy” as part the PE definition can be illustrated with the OMNI scales in resistance exercise [56]. Attached to the scale is a figure of a person holding a barbell above his head which represents PE. The increments from 0 to 10 are associated with heavier loads on the bar the person is holding. This figure could mislead people into rating the perceived heaviness of the object they are lifting, rather than their PE.

4 Scales and Instructions

It is common for scientists to use altered versions of the Borg scales. In his book, Borg [30] highlights some errors and possible misuse of his scale: copying alterations, changes in format, verbal anchors, scale design, and shortening or modifying the instructions (p. 15). For example, the Borg CR 10 Scale includes a dot (•) below the digit 10. Whereas 10 is defined as “extremely strong” (explained as the strongest effort one has ever experienced), the dot is defined as the “absolute maximum”. The purpose of the dot is to avoid a ceiling effect by allowing people to rate higher values than 10 in case they should perceive such effort. However, in some studies, the dot is excluded from the scale [48, 57, 58]. The Borg CR 10 Scale also went through minor modifications over the years [59], yet the exact version used in studies is not always reported. It is unclear if—and to what extent—altering the scales or using different versions of the same scale affects ratings. Other effort scales are also widely used. For example, nine different OMNI scales were developed for distinct exercises and populations [2]. While many scales were validated against Borg’s RPE, the purpose of having multiple scales attempting to capture the same construct is unclear [1].

Distinct instructions and explanations are provided alongside RPE scales [2, 14, 48, 60]. Multiple or varying instructions and explanations can hinder measurement validity. Consider the importance of explaining what the upper limits of scales stand for (also known as anchoring, e.g., 10 is “maximal effort”). Without explicitly associating this number and term with a relevant event that can occur in the to-be-completed exercises, its actual meaning can be interpreted in various ways. For example, during a 10-km running time trial, when asked to rate RPE, some may interpret the upper limit in relation to the fastest they can run at a given point in time, whereas others as the fastest they can run at view of task completion. Unless instructed otherwise, both are legitimate ways of interpretation, and both could yield different ratings. In resistance training studies, the scales’ upper limit is normally associated with the terms “momentary failure” and “repetition maximum”. But inconsistencies in the way these two terms are defined and explained could lead people to rate their efforts differently [61, 62]. Some understand RM as the inability to complete another repetition, leading them to cease the set without trying to complete the next one. Others can understand RM as reaching the actual point during a repetition that they are unable to complete. These subtle differences—which are at least partly a function of the provided instructions—have the potential to influence subjects’ ratings.

5 Part vs. Whole-Body RPE

Scientists developed a division of RPE measurement between perceptions of effort occurring in (1) a specific muscle group (termed peripheral, local or differentiated RPE), (2) the cardiorespiratory system (termed central or respiratory-metabolic RPE), and (3) the body as a whole (termed undifferentiated RPE) [20,21,22,23, 63]. When asked to rate two or three of these RPE measurements, subjects rate specific muscle group effort the highest [22, 23]. While this approach to measuring effort is thought to enhance precision [23], it raises questions regarding the nature of the PE construct and its measurement.

If PE resembles “a social psychophysiological phenomenon” [64] and is presented as a “Gestalt” occurrence [65], then the meaning of its attribution to a specific body part is ambiguous. For example, one can rate how excited she feels, yet might be confused if also asked to rate how excited she feels in her quadriceps. Alternatively, if PE resembles, or is presented as a sensation that is attributed to a specific body part, then the meaning of its general “Gestalt” attribution is ambiguous. For example, when suffering from a toe injury, one can rate the amount of pain she feels in her toe, yet might be confused if also asked to rate the amount of pain she feels in general. Although people rate part and whole-body RPE, it is unclear if the reported values represent the same construct.

In view of the above, it is debatable if applying the same PE construct to measure both part and whole-body RPE is a sound administration strategy. The provided examples also emphasize the need to measure other perceptions in addition to, or instead of, RPE. For instance, it may be more suitable to collect whole-body RPE and perception of discomfort from specific body parts within a given study, rather than RPE from both. By doing so, we anticipate that it will be easier for people to grasp the notion of PE, which will thereby increase measurement validity. If one chooses to embrace part and whole-body viewpoints of PE, then it may be better to avoid simultaneously measuring RPE from both as commonly done. By asking people to rate both, there is a greater risk of receiving ratings to questions not posed (e.g., one rating addressing PE or affect and the other addressing pain or discomfort). However, such response inconsistencies are less likely to occur if one or the other administration strategies are exclusively used in the same study. Taken together, there is a need to evaluate if PE should be treated and administered as part or as a whole. Until this issue is resolved, we suggest considering the points put forth in this section when measuring and interpreting part and whole-body RPE values.

6 Solutions and Future Directions

Our proposed solutions are composed of two complementary approaches. The first is to narrow the scope of PE measurements by committing to fewer definitions, terms, and instructions. The second is to widen the scope of exercise and sports science by including measurements of other perceptions more frequently. We provide examples of useful single-item scales and when they can be used. We focus on single-item scales due to their similarities to RPE and their ease of administration during exercise. It is important to keep in mind that the scales to be introduced in the following sections may also suffer from methodological shortcomings, but these go beyond the scope of this article. Rather than following step-by-step solutions, our goals are to stimulate thought, explore new possibilities, and encourage new research directions.

6.1 Narrowing the Scope

A simple solution to the multiple definitions problem is to reduce the number of terms included within the PE definition. As illustrated above, people can differentiate between effort and other perceptions included in the popular PE definitions (fatigue, discomfort and heaviness). Of the existing prevalent definitions of PE, Marcora’s is preferable, as it is simpler and includes fewer terms. But we propose narrowing this definition further. As mentioned previously, the term “heavy”, which is part of Marcora’s definition, can be misleading. Additionally, the added value of the following phrase “… the conscious sensation of …” within the definition is debatable. Since people are asked to explicitly report their perceptions, a concious state is mandatory.

The definition of effort offered by Abbiss and colleagues [10], i.e., “The amount of mental or physical energy being given to a task”, is a step in the right direction. Psychologists Gendola and Wright define effort as “Mobilization of resources to carry out instrumental behavior” [66]. We embrace the characterization of effort as a process of investing certain resources, and view PE as the perception of their investment, irrespective of the actual relationship between the two. We thus propose that PE can be defined as: “The process of investing a given amount of one’s perceived physical or mental resources out of the perceived maximum to perform a specific task”. This can be simplified to perceived invested resources divided by perceived maximal resources. The addition of a reference point—the perceived maximum—can translate this definition into actionable measurement units. For example, the perceived maximum to complete a given task corresponds to the upper range of effort scales (e.g., 10 in a 0–10 scale). However, the usefulness of this definition remains to be established.

Qualitative methods can assist coming to terms with the PE definitions. The Delphi technique is an iterative multistage process designed to transform experts’ opinion into group consensus via questionnaires and interviews [67]. A Delphi study incorporating psychophysicists, exercise scientists and psychologists focused on the PE definitions can contribute to this goal. Due to the multiple views on PE, reaching a consensus among experts may be challenging. Alternatively, reaching an agreement about specific study designs that will resolve in favor of one definition of PE is a feasible task. Another qualitative avenue includes semi-structured interviews and focus groups of non-experts. In these designs, people are first provided with explanations and definitions of constructs and measurement instruments, and then asked to report if the explanations match their understanding of them. Such studies are successfully conducted in pain sciences, leading to interesting and actionable insights about the mismatches between pain definitions and measurement scales, and the meanings patients attribute to them [68,69,70]. Using a similar approach to investigate people’s understanding of PE should lead to interesting insights. These qualitative designs coupled with the on-going quantitative applied and basic research will hopefully lead to a deeper understanding of the PE construct, and to an agreed upon PE definition.

6.2 Widening the Scope

It is our impression that exercise and sport scientists, at times, measure RPE even when attempting to answer different questions. The relative absence of scales that measure other perceptions is speculated to stem from the establishment of PE as the single most recognized subjective variable in exercise and sport science since the late 1960s [71]. We echo other scientists’ recommendations [41, 71], and pose that as a field, we should measure other perceptual experiences more often. This will allow us to be more certain that we are receiving answers to the questions being asked. This recommendation is also in line with Abbiss and colleagues’ [10] call to distinguish between effort and exertion, and treat them as separate perceptual experiences. Yet, as argued previously, it may be better to treat these two terms synonymously and generate other terms that are clearly distinct from effort/exertion to avoid misconceptions.

To illustrate the overreliance on PE, consider the following: some studies measured RPE post exercise as a tool to assess the abatement of PE [72, 73]. Part of the instructions in a study by Robertson and colleagues [72] included “… perception of effort is defined as the intensity of effort, stress, discomfort, and/or fatigue that you feel during recovery from dynamic exercise”. Three minutes post exercise, the reported RPE values were as high as 18 on the Borg’s 6–20 scale [72]. While PE occurs during exercise, or can be reported as a memory of the exercise (e.g., session RPE [3]), it should not be experienced during rest. The high RPE scores recorded three minutes post exercise were likely perception of fatigue, stress or discomfort, but not effort. In contrast to RPE, the new ROF scale developed by Micklewright and colleagues [43] might be better suited to answer a recovery related question during rest.

Affective valence is a construct of potential interest. It refers to the positive and negative qualities of emotions and moods [74, 75]. During exercise, heterogeneous inputs, including fatigue, pain, reward and pride, are processed and acted upon [76, 77]. Since they reside on different continuums, they need to be converted to a common scale for them to be assessed and compared. This process, reflected in affective valence, allows for a decision to take place in real time (e.g., reduce running speed) [71, 76, 77]. Affective valence is thought to act as a ‘common currency’ in the tradeoffs between opposing motivations regulating exercise [71, 76, 77]. Affective valence is commonly measured by the Feeling Scale (FS): an 11-point bipolar scale ranging from + 5 (very good) to − 5 (very bad) [64]. Exercise psychologists have extensively studied the FS for the past 30 years, mostly among sedentary populations [74, 78]. A key finding of these studies is that people reporting lower FS ratings during exercise are less likely to continue exercising over time [74, 79, 80]. This finding is corroborated with the fact that people prefer to avoid unpleasant experiences [74, 76]. To date, however, the FS remains relatively unexplored by exercise and sport scientists. Measuring the perception of discomfort is another option. Discomfort is in many ways similar to negative affect measured by the FS. However, the Discomfort Scale [19], which ranges from 0 (no discomfort) to 10 (maximal discomfort), places greater emphasis on the negative affect compared to FS. When greater resolution is sought after, and the exercise is expected to cause considerable negative affect, the Discomfort Scale may be a better alternative.

From an applied perspective, lower FS scores reported by athletes during a physical task that is completed on a regular basis could indicate the beginning of a burnout or insufficient recovery. Another example is when two athletes are exercising at a similar RPE, but one rates the exercise as pleasurable while the other as not. This insight could lead coaches to adopt different training strategies. The observation that lifting heavier loads to task failure leads to lower perception of discomfort compared to lighter loads is also of practical value [51]. For example, lifting heavier loads may be a preferred option when a training program includes sets completed to task failure. This is especially the case when the goal is to encourage exercise adherence, mostly among untrained populations.

Guiding participants to complete a number of repetitions within a particular distance from their estimated RM is a training strategy growing in popularity [4, 60, 81, 82]. This strategy allows prescribing and monitoring resistance training by achieving or avoiding certain intensity zones. For example, by following the estimated Repetitions to Failure Scale [60], subjects can be asked to terminate a set two repetitions away from RM. The similar Repetitions in Reserve (RIR) Scale translates the former into an RPE rating which ranges from 1 to 10 [4]. For example, subjects can be asked to reach an RPE score of 8, indicating that only two repetitions are left in reserve (i.e., 8RPE equals to 2RIR). While these scales can be viewed as versions of RPE scales, they are included in this section for two reasons. First, they offer a unique aspect that is absent from most scales: a perfect match between the measurement unit and the construct of interest: the number of repetitions left to task-failure. This makes rating these scales straightforward, which reduces the likelihood of misunderstandings, and increases measurement validity. Second, perception of repetitions left in reserve can be influenced by a wide array of perceptions, including, but not limited to, PE.

RIR scales are effective in monitoring training sessions, as they are correlated with bar velocity across exercises [83, 84], and are reasonably accurate at capturing RM [60, 81, 82]. They also allow for effective exercise regulation as sets, repetitions, and loads can be adjusted in view of one’s perception of RIR. For example, two recent studies of 8 and 12 weeks’ duration found that a RIR based method produced greater strength gains compared to percentage-based training among trained participants [6, 7]. Despite their clear benefits, RIR scores were found to be less accurate in untrained subjects [85] (although this finding is not consistent [81]), and with sets including high number of repetitions [81, 82]. Hence, RIR scales seem to be better suited for participants with resistance training background, performing relatively low number of repetitions.

7 Putting Scales into Practice

We conclude with practical recommendations on how single-item scales can be used in practice to reduce within and between laboratory variability, as well as confusion among participants and practitioners.

  • It is vital to carefully consider which specific construct would be most suited to answer a given question, and subsequently, which scale is best suited for its measurement. These decisions should be justified before data collection, and reported in the subsequent manuscript.

  • All laboratory members involved in data collection procedures should be aligned as to which scale, version of scale, instructions and explanations are to be used. Written scripts can assist this process. By making sure that this process is adhered to, within laboratory reliability and validity aspects concerning the scales are ensured.

  • Authors of manuscripts should report which scale and version were used, coupled with the instructions and anchoring procedures. This information can be presented within the article, or as a supplementary document. This will allow for a richer interpretation of the results, replication attempts, meta-analytic procedures, and generally, increase between laboratories reliability and validity of scale measurement.

  • Whenever possible, modifying validated scales and their instructions should be avoided as even small changes can impact scales’ ratings. In cases in which scales development or modifications are justified, it is important to use construct, concurrent and discriminant validation procedures to ensure the scale’s validity.

  • An explicit, written question that subjects are expected to answer should be added to scales, e.g., “How effortful is the task?” for measuring PE. As single-item scales are used simultaneously (e.g., ROF and RPE), the addition of this question will remind the physically active subjects what exactly they are required to rate.

8 Conclusion

Despite the benefits of RPE scales, they suffer from limitations that warrant attention and action. These limitations include multiple PE definitions, scales, instructions, and the division of PE to body parts, all of which risk measurement validity. To overcome these problems, we proposed two conceptual solutions. The first is to narrow the number of PE definitions, the terms included within the PE definitions, the implemented scales, and instructions. The second is to incorporate other single-item scales that measure other perceptions more often. Following these recommendations will enhance measurement precision and expand measurement breadth.