Introduction

Maximal oxygen uptake (\( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \)) is widely regarded as the criterion measure of cardiorespiratory fitness (ACSM 2010) and a strong and independent predictor of mortality in patients with known cardiovascular disease (Aijaz et al. 2009; Kavanagh et al. 2002; Laukkanen et al. 2004). However that its measurement has safety and cost implications have encouraged traditionally the use of numerous methods for predicting \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) from sub-maximal exercise protocols. Typically, these have used heart rate responses to incremental exercise as the independent variable, but a persuasive case has been made in the last few years for predicting \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) from oxygen uptake values generated during a perceptually regulated exercise test (PRET) (Eston et al. 2005, 2006, 2008; Faulkner et al. 2007; Morris et al. 2009). The PRET utilises the Borg 6-20 RPE scale (1998) in its so-called production mode whereby the participant is requested to set their own exercise intensity in response to a range of prescribed RPE levels. Whilst utilising RPE in this way has been shown to be a valid and reliable means of self-regulating safe and effective exercise in a variety of modalities, such as cycle ergometry (Buckley et al. 2000; Eston and Williams 1988; Kang et al. 1998), treadmill running (Dunbar et al. 1992; Eston et al. 1987) and rowing ergometry (Marriott and Lamb 1996), research on the predictive capability of a PRET has thus far only employed cycle ergometry.

In the first study on this theme Eston et al. (2005) predicted \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) from an incremental cycling PRET protocol consisting of 4 min bouts at RPE 9, 11, 13, 15 and 17 to within ± 6 ml kg−1 min−1 of actual values in a sample of ten active young males. Subsequent studies addressed the predictive success of the cycle PRET when different lengths of exercise bouts were employed (Eston et al. 2006), amongst active versus sedentary male and female populations (Faulkner et al. 2007), when a discontinuous protocol was used (Eston et al. 2008), and when refined RPE instructions were provided (Morris et al. 2009). A common element to these studies is that the validity of the PRET was examined from \( {\dot{V}}{\text{O}}_{ 2} \) values obtained during exercise that was prescribed across a broad sub-maximal perceptual range; from RPE 9 (Very light) to RPE 17 (Very hard). However, the upper-most level is considered too strenuous for sedentary and clinical populations (ACSM 2010; AACVPR 2006; BACR 1995) and in acknowledging this limitation, the above studies tended to generate additional predictions of \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) that did not include the data from RPE 17 (i.e. RPE 9–15, or RPE 9–13) to explore if they were as accurate. Compared to the predictions based on the full data set (RPE 9–17), which at best reflect margins of error of ±17% (Eston et al. 2005; Morris et al. 2009), the errors associated with the 9–15 (at best ±23–26%, Eston et al. 2005; Morris et al. 2009) and 9–13 (±35–36%, Faulkner et al. 2007; Morris et al. 2009) predictions were found to be notably larger, and excessive on an individual basis. However, we have posited previously (Morris et al. 2009) that if data are collected from PRETs that purposely are limited in intensity to RPE 15 (or RPE 13), the accuracy of the\( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) predictions might be better on the basis that in the absence of an exercise protocol containing bouts labelled “Very hard” (or “Hard), the participants are less likely to under-regulate their exercise efforts as part of a pacing strategy, in the manner suggested by Faulkner et al. (2007).

There are therefore two key elements that need to be addressed: (1) the agreement of the PRET with different modes of exercise, and (2) the agreement of the PRET when the exercise intensity of is no more than RPE 15 (hard/heavy). In this context, there is a case for exploring the validity and reliability of a treadmill-based PRET since a treadmill is a popular mode employed in exercise testing and usually produces higher \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) values than a cycle ergometer. In addition, investigating a PRET whose upper limit is capped at RPE 15, a value that is familiar in general fitness screening situations (where it is utilised as a termination point in many sub-maximal tests), is justifiable as it lowers the health risk and ethical concerns associated with more strenuous or exhaustive exercise, reduces testing times and costs, and the degree of participant motivation required (Faulkner and Eston 2008). Accordingly, the purpose of this study was to assess the criterion validity of predicting \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) from sub-maximal \( {\dot{V}}{\text{O}}_{ 2} \) values elicited during a truncated treadmill-based PRET (maximum RPE 15) and the reliability of the predictions over three repeat trials.

Method

Participants

Eighteen (14 males and 4 females) healthy active participants (21.7 ± 2.8 years, 71.21 ± 12.81 kg, 172 ± 0.1 cm) were recruited from a University population to take part in the study. After receiving oral and written explanation of the study, all the volunteers gave their consent to participate and prior to each testing session completed a pre-test health status questionnaire. Ethics approval for the study was granted by the Faculty of Applied and Health Sciences ethics committee at the University of Chester.

Procedures

This study utilised a repeated measures design in which each participant was required to attend the laboratory on four separate occasions (48–72 h apart), three times to perform a continuous, incremental sub-maximal PRET and one further time to perform a graded exercise test to exhaustion (\( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) test). The PRETs were administered on three occasions since previous studies involving cycle ergometry have highlighted how practise improves the repeatability and accuracy of the \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) predictions (Eston et al. 2005, 2006; Faulkner et al. 2007; Morris et al. 2009). The \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) test was performed subsequent to the three PRETs as is was considered that this would avoid participants being sensitised to the full perceptual range of the RPE scale (up to RPE 20), which would not occur in environments for which sub-maximal tests might be particularly useful (e.g. clinical or sedentary). All exercise testing was conducted on a motorised treadmill (Woodway PPS55 Sport-I Treadmill), which had its display screen concealed from participants so that no external feedback relating to exercise intensity (speed, gradient and HR) was available. Oxygen uptake and heart rate were recorded continuously during each exercise testing session via breath-by-breath online gas analysis (Oxycon, Jaeger, Germany) and Polar wireless telemetry (Polar s810i, Finland). An automated gas and volume calibration was performed prior to each testing session in accordance with the manufacturer’s guidelines. Each participant was tested at a time of day similar to their previous test/s (within 2 h) to control for physiological variation caused by circadian rhythms (Reilly 2007).

Sub-maximal treadmill PRET

Before performing each PRET, participants were presented with the 6-20 RPE scale (Borg 1998) and read out a set of instructions (Morris et al. 2009) for its application during the exercise trial. These instructions were specific to use the RPE scale in its production rather than estimation mode. Following this, each PRET protocol required participants to regulate their intensity to match four RPE levels (9, 11, 13 and 15) prescribed by the investigator in an incremental order. Participants commenced walking at a speed of 1.3 km h−1 for 3 min before being instructed to produce an exercise intensity equivalent to RPE 9 (Very light) on the RPE scale. The exercise intensity was then adjusted by the researcher on instruction from the participant using the control panel on the treadmill. The intensity was first altered via an increment in speed of 1.3 km h−1 and secondly by an increase in gradient of 0.5% on request from the participant. This was performed to elicit a change in intensity of approximately 0.2–0.6 METS per instruction from the participant. Participants were given three minutes to adjust the exercise intensity to their satisfaction (which matched RPE level 9, Very light), at which time their expired air was analysed for the following three minutes. One minute into the recording participants were allowed a final refinement of the self-regulated intensity. This procedure was then repeated for RPE levels 11, 13 and 15. The mean oxygen uptake and heart rate during the final 30 s of each RPE level in all bouts were calculated. Upon completion of the PRET, a warm down was performed at 5 km h−1 until heart rate dropped below 100 b min−1.

\( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) protocol

\( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) was determined via the Bruce protocol (Bruce et al. 1973), a graded exercise test (GXT) which employs a continuous and incremental procedure, starting at a speed of 2.74 km h−1 and a gradient of 10%, increasing in gradient by 2% every 3 min inline with simultaneous increments in speed of 2.74, 4.02, 5.47, 6.76, 8.05 and 8.85 km h−1. Expired air and HR were monitored in the manner described above, with the addition of blood lactate measured immediately after the cessation of the last stage (Lactate Pro, Arkray Japan). The establishment of a \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) for each participant was evaluated against the criteria set out by Bird and Davidson (1997) on behalf of the British Association of Sport and Exercise Sciences—subjective fatigue and volitional exhaustion, a plateau in \( {\dot{V}}{\text{O}}_{ 2} \), RPE 19 or 20, HR within ±10 beats of age-related maximum, post exercise lactate >8 mmol l−1 and a respiratory exchange ratio >1.15)—and confirmed if four of them were met. These criteria are adopted widely in laboratories across the UK and reported frequently in the applied physiology literature. Nonetheless, we note that future research might need to reappraise the use of secondary criteria as a means of validating the attainment of a maximal oxygen uptake in light of findings from a recent study of active males by Poole et al. (2008) which demonstrated their tendency to incorrectly reject the occurrence or under-estimate the values of \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \).

Data analysis

Following a check on the normality of their distributions via the Shapiro–Wilk statistic, descriptive statistics (mean ± SD) was computed for \( {\dot{V}}{\text{O}}_{ 2} \) values across all four exercise trials. In the manner of Faulkner et al. (2007), individual linear regression analyses [\( {\dot{V}}{\text{O}}_{ 2} = a \, + \, b \) (RPE)] were performed on each participant’s four measured \( {\dot{V}}{\text{O}}_{ 2} \) values (from RPE levels 9, 11, 13 and 15) to predict their GXT determined \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) at the typical (RPE19) and theoretical (RPE20) end-points. Separate one-way repeated measures ANOVAs were used for each predictive model to compare trial means to actual \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) scores, followed up, where appropriate, with Bonferroni adjusted paired t tests to locate differences between specific means. The agreement between the criterion \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) values and those predicted from the PRETs were calculated with the 95% limits of agreement (LoA) technique, on the basis that the errors (differences) were found to be normally distributed and homoscedastic (Bland and Altman 1986). The LoA (bias ± 1.96 × SDdiff) technique was also employed to assess the reproducibility of the \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) predictions across the three trials, with the addition of the typical error (SDdiff/√2; Hopkins 2000), and the intraclass correlation coefficient, calculated via a two-way mixed effects model for absolute agreement. All data analysis was conducted using SPSS for Windows (version 16.0) and alpha was set at the 0.05 level.

Results

All the participants satisfied the criteria for achieving \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) during the GXT and the mean \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) for the sample was 48.0 ± 6.2 ml kg−1 min−1. Of note was that everyone reported a terminal RPE of 19. In each PRET trial, increases in RPE level were accompanied by linear increases in mean \( {\dot{V}}{\text{O}}_{ 2} \) (Table 1) and all individual correlations from the regression analyses of RPE and \( {\dot{V}}{\text{O}}_{ 2} \) (levels 9–15) exceeded 0.91, except two (0.80, trial 1; 0.78, trial 3), and were typically 0.96 or higher.

Table 1 Mean (±SD) oxygen uptake values (ml kg−1 min−1) across three PRET trials

The mean differences in \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) between the measured and the PRET values were typically small (<2.5 ml kg−1 min−1) and non-significant (P > 0.05) for each predictive model (Table 2). Interestingly, the RPE19 model generated the smallest biases in trials 1 and 2, whereas the RPE20 model generated the smallest bias in trial 3 (0.6 ml kg−1 min−1). Importantly, the 95% LoA between the PRET predictions and the criterion measure were narrowest in trial 3 when RPE20 was used (Table 3). Expressed as a ‘margin of error’, this degree of variation equates to approximately ±15%. Moreover, this reflects a marked improvement from the first trial in which the error was approximately 27%.

Table 2 Mean (±SD) predicted oxygen uptake values (ml kg−1.min−1) across three PRETs
Table 3 Agreement of PRET predicted and actual \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) values

The statistics in Table 4 show the reproducibility of predicted \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) calculated via each model over three trials. Two key findings are evident; first, for both models the LoA are narrower (indicating better agreement) between trials 2 and 3 than between the first two trials, and secondly, the consistency of the RPE20 predictions is superior to those generated by the RPE19 model.

Table 4 Reliability of \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) predictions across repeated trials

Discussion

The current investigation has provided data which are commensurate with those of several recent studies dealing with the validity of predicting \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) via a PRET (Eston et al. 2005, 2006; Faulkner et al. 2007; Morris et al. 2009). Our findings have an advantage over previous studies in that they were produced on a treadmill and from a protocol that had an upper limit of RPE 15. In particular, this treadmill-based PRET generated relatively accurate predictions for most participants, despite the perceptual range being confined to RPE 9-15. Furthermore, and in keeping with previous studies, the reproducibility of these predictions was seen to improve with practice to a level that could facilitate their application in exercise interventions.

That our active participants, individually and as a sample, were generally able to adjust their exercise output from a low (RPE 9; approximately 32% \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \)) to a recommended safe and effective level (RPE 15; approximately 72% \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \)) was not an unexpected finding given the incremental and continuous nature of the PRETs employed. Previous studies have demonstrated this competency during incremental cycling (Faulkner et al. 2007; Eston et al. 2006, 2008), amongst both active and sedentary individuals, albeit involving protocols that extended exercise effort to a level of RPE 17 which for many participants might be >80% \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) and inappropriate (ACSM 2010; AACVPR 2006; BACR 1995). More impressive was the accuracy of the \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) predictions, given that oxygen uptake data were gathered from only four stages (9, 11, 13, and 15). The optimum level of agreement between predicted and measured \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) (−0.6 ± 7.1 ml kg−1 min−1, or ±15%) is almost identical to that observed in our recent study (Morris et al. 2009) that used a cycling-based PRET across the range 9–17 (±17%) and compares favourably to the findings from previously reported truncated (9–15) prediction models (up to RPE20). That is, the random (within-subjects) error being ±8.4 (±23%; Eston et al. 2005), ±12.4 (31%; Eston et al. 2008), ±11.2 (26%; Faulkner et al. 2007), ±10.6 (22%; Eston et al. 2006) and ±10.8 (26%; Morris et al. 2009) ml kg−1 min−1. This lower margin of error revealed in the current study supports our notion that a PRET devoid of a “Very hard” (RPE 17) stage would benefit a participant’s compliance with the task and reduce the likelihood of his/her adopting a pacing strategy (Tucker 2009).

The accuracy of the \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) predictions was improved after the first of the three PRETs, and likewise after the second, reinforcing the merit of habituating participants to the task of governing their own exercise intensity in this way. This finding was independent of which prediction model was used, but after three trials, the RPE20 model proved to be most accurate. Based on their truncated (9–15) models, Faulkner et al. (2007) reported not much difference between the RPE20 and RPE19 predictions, although based on the full range (9–17) the RPE19 model was more accurate (after 3 trials). In absolute terms, the interpretation of the agreement between our predicted and actual measures of \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) warrants attention, particularly as this has tended to be overlooked in previous investigations. In the manner exemplified by Morris et al. (2009), the best LoA from the current data infer that, in the worst case, an individual with a \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) of say, 40 ml kg−1 min−1, could have a predicted value of as high as 47 or as low as 33 ml kg−1 min−1. Whilst comparable figures for other indirect methods of predicting maximal oxygen uptake are scarce, they compare well to those reported for the established Astrand cycle test (Nevill and Atkinson 1997) and the Chester step test (Buckley et al. 2004), both of which are reliant on measures of heart rates. Moreover, 15 (84%) of the participants had PRET predicted values (from trial 3) that were within ±5.5 ml kg−1 min−1 (±11.5%) of their actual \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \). Given that a small proportion of adults can be expected to have difficulties in understanding and utilising the RPE scale (Borg 1998), this lower figure is probably more realistic and, depending on the goals of the individual or exercise practitioner (or clinician), deserves a favourable interpretation.

The consistency of the \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) predictions from the two models improved markedly after the first trial, and was optimum between trials 2 and 3 for the RPE20 model. Whilst slightly less impressive than the equivalent statistics we reported for cycle ergometry (0.7. ± 7.3 ml kg−1 min−1; Morris et al. 2009), this level of agreement sits well alongside those reported in a positive manner for truncated 9–15 ranges (also for cycle ergometry) by Faulkner et al. (2007; −0.6 ± 12.0 ml kg−1 min−1) and Eston et al. (2007; 1.3 ± 9.7 ml kg−1 min−1). In addition, this could be due to the absence of the relatively high RPE 17 stage from the current study and its negative impact on the validity of the participants’ task compliance. Notwithstanding this, the reliability of our predicted \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) values reflects agreement for 95% of the sample that at worst is ±8.5 ml kg−1 min−1 (±17.5%) and for those 15 participants whose predictions were most accurate (see above), their reliability was, not surprisingly, better at ±6.5 ml kg−1 min−1 (±13.5%). Moreover, if we adopt the somewhat liberal ‘typical error’ calculation of reliability, the interpretation is better still because the agreement between trials is about one-third of the LoA. However, the optimal typical error for the current data (3.1 ml kg−1 min−1) only reflects the reliability for about two-thirds of our sample, and not most (95%) as reflected in the limits of agreement.

The matter of whether such agreement, as with the analysis of validity, is ‘acceptable’ or not has typically received inadequate attention in research of this kind (Morris et al. 2009). One useful approach in dealing with this is to consider whether the extent of the trial-to-trial reliability observed would mask any ‘real’ variability (change) in \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) that might be expected to occur due to an intervention or training programme? Previously we have argued (Morris et al. 2009) that changes in excess of the magnitude of those reported (between 3.1 and 8.5 ml kg−1 min−1 in this study, depending on the reliability statistic used) would be necessary. Therefore, since such changes are likely to occur amongst healthy populations following suitable training, the reliability of the current predictions can be evaluated optimistically.

In conclusion, the validity of a treadmill-based sub-maximal PRET as a predictor of \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) has been found to be comparable to that reported previously in studies utilising a cycle-based PRET. This is notable given that the protocol did not extend participants beyond RPE 15 (“Hard”) and is more realistic for individuals for whom intense exercise is not recommended. It is evident that, given practice, such a protocol can yield predictions of \( {\dot{V}}{\text{O}}_{ 2\hbox{max} } \) that have satisfactory reliability and are more accurate than other heart rate-related predictive methods.