Introduction

The classic 15-point rating of perceived exertion (RPE) scale (Borg 1998) is a widely accepted tool that is commonly used to complement the use of objective physiological markers of the exercise response and for prescribing exercise intensity in healthy adults and some patient groups (BASES 1997; ACSM 1998, 2000). Ratings of perceived exertion are perhaps most widely used as a response measurement (passive estimation) during graded exercise tests, as they correlate very closely with increments in physiological (e.g., heart rate, oxygen uptake) and physical markers (e.g., power output, speed) of exercise intensity. A logical, but lesser used application of the RPE is to use it in conjunction with other measures to regulate exercise intensity (active production) in healthy adults and for rehabilitation in some clinical populations (for reviews see: Borg 1998; Dishman 1994; Eston and Connolly 1996; Eston and Williams 2001; Hampson et al. 2001; Noble and Robertson 1996; Watt and Grove 1993; Williams and Eston 1989). A key motivation for this development has been the desire to use the concept of perceived exertion for self-regulating exercise intensity, particularly where the goal is to sustain adherence to an exercise program and achieve levels that are considered to be safe and beneficial to health and fitness (Williams and Eston 1989; Dishman 1994). Applying the ratings of perceived exertion in this so-called production mode requires participants to adjust (and maintain) their exercise intensity to match pre-set (or prescribed) ratings or pictorial visual markers of various states of exertion on a given perceptual scale.

Numerous studies have confirmed the validity of self-regulation of exercise intensity guided by the ratings of perceived exertion. This has been demonstrated for treadmill running (Dunbar et al. 1992; Eston et al. 1987; Glass et al. 1992; Kang et al. 2003; Smutok et al. 1980), cycling (Buckley et al. 2000; Dunbar et al. 1992; Eston and Williams 1988; Kang et al. 2003), arm and leg ergometry (Kang et al. 1998); rowing ergometry (Marriott and Lamb 1996), wheelchair exercise (Ward et al. 1995) and cycling in children and adolescents (Eston et al. 2000; Eston and Williams 1986; Williams et al. 1991; Robertson et al. 2002). Research also shows that practice tends to improve the ability to reproduce a given exercise intensity when regulated by ratings of perceived exertion (Buckley et al. 2000; Eston and Williams 1988; Eston et al. 2000).

One of the primary purposes of a graded exercise test (GXT) is to establish a range of exercise intensities which are appropriate for the exerciser. The exertional ratings collected during the GXT provide a highly practicable way of prescribing an appropriate exercise intensity range. Thus, a typical application of the RPE production mode is to prescribe an RPE which is anchored to a given heart rate (e.g. 70% HRmax), oxygen uptake or blood lactate response derived from a prior GXT (passive estimation protocol), on the assumption that the estimation–production paradigm is valid. Kang et al. (2003) have recently demonstrated the validity of this approach using the OMNI Scale, which comprises both pictorial and verbal descriptors. However, several laboratory-based studies have reported a mismatch of exercise intensities at a given RPE between estimation and production modes, using the more commonly applied Borg RPE Scale. Errors of exercise intensity underproduction (\(\dot V{\text{O}}_{\text{2}}\), HR) were observed in these studies (Byrne and Eston 1998; Dunbar et al. 1992, 1994; Kang et al. 1998; Marriott and Lamb 1996). This is generally attributed to the limitation of inferring the similarity of physiological responses between these passive and active processes. As an alternative, a GXT that uses a range of effort production levels may provide an appropriate means of prescribing exercise intensity, providing that such an application leads to an acceptably accurate prediction of the maximal aerobic power.

A number of models have been proposed to account for the relationship between the symptoms of exertion during dynamic exercise and the overall perceptual experience during physical exertion. For example, Weiser and Stamper’s (1977) hierarchical model of fatigue and Noble and Robertson’s (1996) global explanatory model of perceived exertion. The models include the interplay of sensory processes of physiological responses and psychological factors that contribute to an individual’s perceptions. As a consequence, on an individual basis, the association between the perceived effort and exercise intensity (oxygen uptake and heart rate) observed during a GXT is likely to be strong and indicated by a high correlation coefficient (r>0.90) (Borg 1970, 1973; Skinner et al. 1973). This is likely to be improved by practice. As such, the possibility exists to exploit this relationship for the purpose of predicting maximal oxygen uptake \(\left( {\dot V{\text{O}}_{{\text{2max}}} } \right) \) from perceptually-regulated, sub-maximal values. Apart from not requiring an exhaustive effort on the part of a participant, reducing discomfort and clinical risk, the merit of this approach is that the prediction would be based upon individual sub-maximal \(\dot V{\text{O}}_{\text{2}}\) responses and not sample-based regression equations (with their inherent error) as is commonplace in many other predictive methods. If this process could be applied with a high degree of reproducibility (reliability), then it would be possible to use it with confidence for monitoring physiological changes due to intervention or training programs.

To the best of our knowledge, no previous research of this kind has been reported, though some attempts have been made to extrapolate the RPE (as the dependent variable) from sub-maximal estimation trials to predict maximal functional capacity (Eston and Thompson 1997) and peak or maximal \(\dot V{\text{O}}_{\text{2}}\) (Dunbar and Bursztyn 1996; Okura and Tanaka 2001). Therefore, the purpose of this study was to assess the validity and repeatability of predicting the maximal aerobic power based on sub-maximal \(\dot V{\text{O}}_{\text{2}}\) values emanating from a perceptually-regulated GXT. We hypothesised that the strong relationship between RPE and oxygen uptake, produced during a series of continuous, sub-maximal increments of similar short-term duration would enable the maximal oxygen uptake to be predicted with acceptable accuracy, and this would improve with practice.

Methods

Participants

Ten male volunteers (23.6±2.1 years, 79.6±10.2 kg, 1.81±0.05 m) were recruited from a university population to take part in the study. The participants were physically active, apparently healthy and asymptomatic of illness and pre-existing injury. All provided written informed consent before participating. The study was approved by the Ethics Committee of the School of Sport, Health and Exercise Sciences, University of Wales, Bangor, which conforms to the ethical standards of the 1964 Declaration of Helsinki.

Procedures

Participants took part in four laboratory-based exercise sessions over a period of 8 days; one GXT to establish maximal aerobic power \(\left( {\dot V{\text{O}}_{{\text{2max}}} } \right)\) followed by three repeated sub-maximal RPE production protocols, each separated by periods of 48 h. The three production trials were conducted on an electromagnetically braked cycle ergometer (Lode Excalibur Sport, Gronigan, Netherlands). The resistance on the cycle was manipulated using the Lode Workload programmer, calibrated to an accuracy of ±1 W, independent of pedal speed. The cycle’s display screen was masked from the participant’s view at all times. On-line respiratory gas analysis was carried out throughout each of the four sessions via a breath-by-breath system (Cortex Metalyser 3B, Biophysik, Leipzig, Germany). The system was calibrated prior to every test in accordance with manufacturer’s guidelines against known concentrations of cylinder gases and a 3-l syringe (for flow volume). In addition, heart rates were monitored using a wireless chest strap telemetry system (Polar Electro T31, Kempele, Finland) and recorded continuously via a link to the Cortex gas analysis system. Heart rate values were concealed from the participant.

The GXT session consisted of a 5 min warm-up against a light resistance on a Monark 814E cycle ergometer, followed by a continuous, incremental protocol starting at 50 W and increasing by 50 W every 3-min until volitional exhaustion. The establishment of \(\dot V{\text{O}}_{{\text{2max}}}\) (ml·kg−1·min−1) for each participant was evaluated using the criteria outlined by the ACSM (2000).

In each of the three production trials, participants were required to exercise at five self-regulated RPE levels (9, 11, 13, 15, and 17) prescribed in an incremental fashion. Prior to each trial, participants were introduced to the Borg 6–20 RPE scale (Borg 1998) and provided with standardised instructions on how to employ the scale in the forthcoming exercise tasks (Borg 1998). There was no prior exercise orientation period where-in participants practised the perceptual intensity self-regulation procedure. It was emphasised that it was the overall perception of exertion that should dictate their exercise responses. During all three trials the scale was mounted in full view of the participant. Four minutes of cycling against a light resistance (50 W) was followed by the participant being instructed to produce an exercise intensity equivalent to an effort rating of 9. This was achieved in the initial 2–3 min by the participant instructing the experimenter to adjust the resistance until he (the participant) was satisfied that the resistance equated with an RPE of 9. At this point, levels of expired air were recorded continuously for the next 4 min, during which no further intensity adjustments were undertaken. Thereafter, participants continued to cycle at 50 W (recovery) for four more minutes, before commencing the second production level (RPE 11). This routine was repeated for the three other RPE levels. Cycling cadence was kept at between 50–80 rpm throughout.

Data analysis

The mean values of the dependent variable (oxygen uptake) observed over the final 30 s of each exercise intensity were calculated and used in the analyses described hereafter. The distributions of the dependent variable at each trial and level were checked for normality with the Shapiro-Wilk statistic and appropriate descriptive statistics were subsequently generated. The variability in these measures was analysed with two-way repeated measures ANOVAs (RPE Levels × Trials). Mauchly’s test of sphericity was applied and if violated, the Greenhouse-Geisser correction factor was used to control the Type 1 error risk. Where appropriate, post-hoc analyses of significant effects were performed with paired t-tests, with an adjustment made via the Bonferroni technique to off-set the increased Type 1 error risk that occurs with multiple comparisons. The level of agreement (bias and random error) for oxygen uptake at each RPE level across the three trials was quantified via the 95% Limits of Agreement (LoA) technique (Bland and Altman 1986) to assess the reproducibility of \(\dot V{\text{O}}_{\text{2}}\) at a given RPE. Linear regression analysis was performed on each participant’s five RPE: \(\dot V{\text{O}}_{\text{2}}\) values from each production trial in order to predict his \(\dot V{\text{O}}_{{\text{2max}}}\) at RPE 20 using the equation (\(\dot V{\text{O}}_{{\text{2max}}}\) = a + b (RPE 20)). These three predicted scores were compared to the \(\dot V{\text{O}}_{{\text{2max}}}\) scores obtained from the GXT via a one-way repeated measures ANOVA. In addition, \(\dot V{\text{O}}_{{\text{2max}}}\) scores were also derived from the separate \(\dot V{\text{O}}_{\text{2}}\) values corresponding to RPE levels 9, 11, 13 and 15 and RPE levels 11, 13, 15 and 17 in order to assess the impact of narrowing the perceptual range on the accuracy of the prediction. The LoA between the predicted and actual values was also quantified. In this way, it was possible to assess the validity of the perceptually-derived \(\dot V{\text{O}}_{{\text{2max}}}\) predictions over the three repeated trials, and whether the agreement improved with practice. The consistency of these predictions from trial 1 to trial 2, and trial 2 to trial 3 was similarly quantified, along with the more conventional intraclass correlation coefficient (ICC), calculated using a two-way mixed effects model for absolute agreement. All data analyses were performed with SPSS for Windows (version 11.5) and alpha was set at 0.05.

Results

The normality of all the distributions of \(\dot V{\text{O}}_{\text{2}}\) was confirmed via the Shapiro-Wilk statistic (P>0.05) allowing means and standard deviations to be calculated. The mean \(\dot V{\text{O}}_{{\text{2max}}}\) from the GXT was 48.8 (±7.1) ml·kg−1·min−1. Table 1 shows absolute and relative \(\left( {\% \dot V{\text{O}}_{{\text{2max}}} } \right)\) oxygen uptake values at each RPE level across the three production trials. ANOVA revealed significant main effects of RPE levels (F=124.1, P<0.001) and trials (F=7.1, P=0.015) on \(\dot V{\text{O}}_{\text{2}}\) values, and a non-significant RPE levels × trials interaction effect (P=0.579). Post-hoc analyses revealed significant (P<0.001) linear increases in \(\dot V{\text{O}}_{\text{2}}\) values between successive RPE levels, and a small (1.9 ml·kg−1·min−1) but significant increase between trial 1 and trial 3 only (P=0.022). At individual RPE levels, the bias between trials 2 and 3 was always less than one unit (ranging between 0.1 and −0.8 ml·kg−1·min−1), whilst the within-subject variation ranged between 4.7 (RPE 15) and 1.7 (RPE 13) ml·kg−1·min−1. These statistics represented an improvement in agreement by more than 50% over the trial 1 to trial 2 comparisons. In addition, sizeable increases in intraclass correlations (for example, from 0.20 to 0.82 at RPE 11, 0.59 to 0.94 at RPE 13, and 0.89 to 0.94 at RPE 17) serve to reinforce the occurrence of this apparent practice or learning effect.

Table 1 Oxygen uptakes (ml·kg−1·min−1 and \(\% \dot V{\text{O}}_{{\text{2max}}}\)) at each RPE level across three production trials

The linear regression analyses of individual participant’s \(\dot V{\text{O}}_{\text{2}}\) values (RPE (x) against \(\dot V{\text{O}}_{\text{2}}\) (y)) yielded correlations in the range 0.92 and 0.99 across the three production trials. The means of the predicted \(\dot V{\text{O}}_{{\text{2max}}}\) values generated from these were 47.3, 48.6 and 49.9 ml·kg−1·min−1 for trials 1, 2 and 3, respectively. The difference between these values approached significance (F=3.4, P=0.056). Post-hoc analysis revealed that this was due to the slightly lower value for trial 1. There was no significant effect of trials when actual \(\dot V{\text{O}}_{{\text{2max}}}\) values were included in the comparison (F=2.4, P=0.093).

Within-subject analyses of the variations between actual and predicted \(\dot V{\text{O}}_{{\text{2max}}}\) values are displayed in Table 2. These 95% LoA statistics were calculated on the basis that the errors (differences) were normally distributed and heteroscedastic. Also included in Table 2 are the LoA based on predictions made from \(\dot V{\text{O}}_{\text{2}}\) values recorded at RPE levels of 9–15 (excluding RPE 17) and at RPE levels 11–17 (excluding RPE 9). It is noticeable that agreement is lessened (larger bias and random error) by the exclusion of the RPE 9 or the RPE 17 \(\dot V{\text{O}}_{\text{2}}\) data from the prediction analysis. Moreover, the within-subject variability (random error) can be seen to be reduced in all three models after the first production trial.

Table 2 95% LoA (expressed in ml·kg−1·min−1) (bias±1.96×SDdiff) for comparisons of actual and predicted \(\dot V{\text{O}}_{{\text{2max}}}\) values

In Table 3 both measures of the consistency (repeatability) of the predictions are superior (higher ICCs and narrower LoA) in the two models that included data from RPE 17. Additionally, the LoA between the second and third production trials reflect better agreement than between the first and second trials in these models.

Table 3 Analysis of the consistency of \(\dot V{\text{O}}_{{\text{2max}}}\) predictions over three production trials using the full range of RPE: \(\dot V{\text{O}}_{\text{2}}\) values (9–17), and limited perceptual ranges of RPE 9–15 and RPE 11–17

Discussion

The current data provide encouraging support for the validity of estimates of \(\dot V{\text{O}}_{{\text{2max}}}\) generated from a purely-perceptually-regulated, sub-maximal GXT. Following an initial trial (that could be viewed as being a practice trial), the \(\dot V{\text{O}}_{\text{2}}\) values produced at five RPE intensities predicted the criterion \(\dot V{\text{O}}_{{\text{2max}}}\) scores for most participants to within ±5–7 ml·kg−1·min−1. In addition, the repeatability of the predictions indicates the potential of this approach for monitoring cardiovascular fitness during exercise or lifestyle interventions.

The success of the participants at regulating their exercise efforts during the incremental, discontinuous production trials is in keeping with previous findings on this theme among adults (Buckley et al. 2000; Ceci and Hassmen 1991; Dunbar et al. 1992; Eston et al. 1987; Eston and Williams 1988; Kang et al. 1998; Marriott and Lamb 1996; Robertson et al. 2002). It is noteworthy, however, that the perceptual intensities set in the three production trials were chosen simply because they represented a wide range of efforts and had verbal descriptors accompanying the numbers. They were not derived from target RPEs anchored to a given exercise intensity established from an initial estimation procedure, as in some earlier (Chow and Wilmore 1984; Smutok et al. 1980) and more recent studies (Dunbar et al. 1992, 1994; Robertson et al. 2002). The linearity of the physiological responses to the range of perceptually controlled intensities was such that extrapolating the data to the limit of the perceived exertion scale (RPE 20) enabled individual predictions of maximal oxygen uptake that were impressively close to the measured values. The findings lend some support to the theory that the RPE may be a measure of the duration of exercise that can be maintained (Noakes 2004). Given that the participants were fully appraised that they would be expected to exercise for a given duration (4 min) once the assigned target RPE had been achieved, and given that they understood the short-term, incremental nature of the RPE production paradigm, it is feasible to consider the RPE as a determinant of when exercise would terminate as a result of volitional exhaustion beyond RPE 17.

These are the first published data of their kind. How ‘robust’ these data are relative to other established sub-maximal predictive methods (that typically utilise heart rate values), such as step and cycle ergometer tests, is difficult to establish since past validation studies have tended to rely upon the size of bivariate correlations or the percent mean difference between measured and estimated values, rather than the range of within-subject variability. One recent study by Buckley et al. (2004) has quantified the validity (against a treadmill criterion) of a popular step-test used in occupational and corporate health and fitness settings in the same manner as the present study. Among university subjects of a similar age and \(\dot V{\text{O}}_{{\text{2max}}}\), the 95% LoA for predicted \(\dot V{\text{O}}_{{\text{2max}}}\) values were −2.8±6.1 and −1.9±7.4 ml·kg−1·min−1 for trials 1 and 2, respectively, of their study. Taking the better LoA (trial 2), this meant (in a worse case scenario) a predicted value might be 5.5 ml·kg−1·min−1 above or 9.3 ml·kg−1·min−1 below the measured value. The equivalent analysis from our data compare favourably as they indicate estimations of 5.1 above and 4.7 ml·kg−1·min−1 below measured values.

In terms of the repeatability of the predictions, the LoA (for each predictive model) were seen to improve with protocol familiarity—the agreement between the second and third trials being better than that between the initial and second trials—reinforcing the impressions gained from the several other studies which have considered this important factor in the utility of perceived exertion scales (Buckley et al. 2000; Eston and Williams 1988; Eston et al. 2000). Indeed, the repeatability of the \(\dot V{\text{O}}_{\text{2}}\) values generated at each RPE level improved markedly by trial 3, and at all levels the statistics were considerably better than those that have been reported by Buckley et al. (2000) and Hartshorn and Lamb (2004). It should be noted, however, that whilst these studies incorporated at least three repeated cycle ergometry trials, they used a different production protocol in terms of the range of RPE levels employed and the order in which they were presented. Such methodological issues have frequently confounded the interpretation of research involving perceived exertion scales.

In practical terms, the LoA calculated between trials 2 and 3 for two of the three prediction models suggest that in order to demonstrate that a non-random change in maximal aerobic power has occurred (following a period of appropriate training or lifestyle intervention), an increase of at least 4–5 ml·kg−1·min−1 would have to be observed. This magnitude of agreement is similar to that in the aforementioned study of the step-test (Buckley et al. 2004) and considerably more impressive than figures reported for the Astrand cycle test (Nevill and Atkinson 1997). A greater change than this, depending upon the fitness status of the individual, would be likely in most cases after training.

The predictions of \(\dot V{\text{O}}_{{\text{2max}}}\) made use of oxygen uptake values collected at sub-maximal intensities across much of the range of the Borg RPE scale (from “Very light” to “Very hard”). Expressed as proportions of maximal values, these RPE levels corresponded to mean oxygen uptakes of approximately 38–86%. Whilst the inclusion of RPE 17 (Very hard) in the protocol was seen to be important to the overall accuracy and consistency of the predictions, it is noteworthy that the prediction of \(\dot V{\text{O}}_{{\text{2max}}}\) using the 9–15 range, although not as precise, was within a prediction accuracy of 12–20% of the actual \(\dot V{\text{O}}_{{\text{2max}}}\). The decision on which perceptual range to use for predicting \(\dot V{\text{O}}_{{\text{2max}}}\) may best be guided by the nature of the population and the need for accuracy. For example, if the protocol is shown to be reliable in a sedentary population, the preclusion of an exertion equivalent to RPE 17 may be preferable, particularly as this may be associated with a negative affect in this population (Parfitt et al. 1996). Similarly, eliminating a bout (of \(\dot V{\text{O}}_{\text{2}}\) data) at the lower end of the scale (RPE level 9) impacted negatively upon the success of the prediction model. Interestingly, however, the trial 2 to trial 3 consistency of the \(\dot V{\text{O}}_{{\text{2max}}}\) predictions without RPE 9 was at par with that of the five level model that included it.

The present study has provided a novel application of the concept of perceived exertion and generated data that suggest that a sub-maximal, perceptually-guided, graded exercise protocol can provide estimates of maximal aerobic power that are as valid and reliable (given practice), if not more so, than established methods based upon heart rate responses to externally prescribed exercise intensities. It is recognized that the testing protocol adopted in the current study may have introduced a time effect that could alter the accuracy of the predictions of \(\dot V{\text{O}}_{{\text{2max}}}\) or the point of volitional exhaustion. In this regard, as the RPEs were fixed, it is possible that physiological work rates may have been slightly lower as the point of exercise termination approached, since the exercise protocol was continuous and had been going on for some time previously. If this was true, it would lead to an under estimation of maximal aerobic power. Although the predicted \(\dot V{\text{O}}_{{\text{2max}}}\) data from trial one was a little lower, this was not statistically significant when compared with the actual \(\dot V{\text{O}}_{{\text{2max}}}\) from the initial GXT. Nevertheless, future studies may consider adopting an intermittent production paradigm to allow for recovery between the successively increasing bouts of exercise intensity at each RPE.

To conclude, we believe that the data generated from the current study may provide the basis for a perceptually-guided GXT that can be used to predict cardiovascular fitness in a variety of population groups where maximal tests are inappropriate or when other physiological markers are altered in some way. It is recognized, however, that our data were gathered on a relatively small sample of active male volunteers during treadmill exercise. Follow-up investigations are warranted to explore the potential of this method with less active individuals and explore the effects of gender, age, exercise experience and fitness on the potential utility of this method of predicting maximal functional capacity in different populations.