FormalPara Key Points

Maximal oxygen uptake (\( \dot{V}{\text{O}}_{2\;\hbox{max} } \)) may be predicted from the linear relationship between overall ratings of perceived exertion (RPE) ≤15 and oxygen uptake (\( \dot{V}{\text{O}}_{2} \)) from a perceptually regulated exercise test (PRET) in different populations (i.e., young, old, active, sedentary, healthy, and some clinical populations) and in various PRET modalities (i.e., cycling, running, and arm-cranking).

Greater accuracy of predictions may be obtained when \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) is extrapolated to RPE20 (rather than RPE19).

Predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) accuracy is improved during the second PRET.

1 Introduction

Maximal oxygen uptake (\( \dot{V}{\text{O}}_{2\;\hbox{max} } \)) corresponds to the highest rate at which an individual can transport and utilize oxygen during exercise involving large muscle groups at sea level [1]. The \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) is often accepted as the best criterion measure of cardiorespiratory fitness [24]. Moreover, assessment of the \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) allows appropriate exercise intensity ranges to be prescribed, which are tailored to an individual’s cardiorespiratory fitness [5]. However, direct assessment of the \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) requires an exercise test to volitional exhaustion. For athletes, this approach needs to be considered in relation to a training and performance schedule, as inadequate recovery may lead to impairment of performance in subsequent days [6, 7]. The risks of maximal exercise testing in patients must also be considered in relation to limitations of pain, fatigue, abnormal gait or impaired balance [8, 9]. Although adverse events (e.g., arrhythmia, myocardial infarction or even death) are rare during properly supervised exercise tests, exercising to exhaustion substantially increases the likelihood of these events in elderly individuals and patients [10]. Maximal exercise testing requires a very high level of motivation [8, 11, 12], and if the tests are regularly repeated during an exercise rehabilitation program (e.g., to adjust the exercise intensity to a new level based on % \( \dot{V}{\text{O}}_{2\;\hbox{max} } \)), they may potentially discourage patients from participating in the program [13, 14]. For all these reasons, numerous studies have explored the efficacy of various submaximal exercise tests to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \), often from heart rate [11, 15, 16]. However, some studies have failed to show the validity of indirect methods using the relationship between heart rate and oxygen uptake (\( \dot{V}{\text{O}}_{2} \)) to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) [17, 18]. Consequently, other methods may be more appropriate.

Effort perception can be defined as the intensity of subjective effort, stress, discomfort and fatigue that is felt during physical exercise [19]. To measure this psychophysiological variable, the most frequently used tool is the Ratings of Perceived Exertion scale (RPE) [20]. This scale, developed by Gunnar Borg [21], was constructed from the basic assumption that physiological strain grows linearly with exercise intensity (e.g., \( \dot{V}{\text{O}}_{2} \)), and that effort perception follows the same linear increase [22]. Numerous studies [2326], including a meta-analysis [27], have previously confirmed this assumption. Indeed, this meta-analysis [27] showed a significant relationship between RPE and \( \dot{V}{\text{O}}_{2} \), with a moderate correlation coefficient (i.e., r = 0.63; p < 0.05). Based on the strong relationship between effort perception and exercise intensity, Ainsworth et al. [28] proposed use of the individual linear regression between RPE obtained from Borg’s scale and \( \dot{V}{\text{O}}_{2} \) (RPE:\( \dot{V}{\text{O}}_{2} \)) to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) from submaximal ‘estimation procedures’. An estimation procedure is a process in which the individual is typically asked to rate how hard an exercise bout feels according to the RPE scale during each stage of the exercise test [29]. In their study, Ainsworth et al. [28] measured \( \dot{V}{\text{O}}_{2} \) and collected overall RPE (i.e., effort perception emanating from sensations of whole body) at the end of each stage during two field submaximal exercise tests (i.e., a sitting chair step test and a modified step test) in older men and women. The \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) was then predicted from the RPE:\( \dot{V}{\text{O}}_{2} \) relationship extrapolated to RPE17 (which was considered as the maximal RPE in older individuals by the authors). The results showed no significant difference between actual and predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) during the sitting chair step test (19.1 ± 6.0 mL·kg−1·min−1 vs 17.2 ± 6.4 mL·kg−1·min−1, respectively), and the modified step test (19.4 ± 6.0 vs 19.7 ± 4.9 mL·kg−1·min−1, respectively). Consequently, Ainsworth et al. [28] concluded that \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) may be predicted from RPE in older individuals, specifically when a modified step test is used (because the mean difference is lower than 2 %).

More recently, others have proposed use of the RPE:\( \dot{V}{\text{O}}_{2} \) relationship to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) from a ‘production procedure’ [30]. Indeed, on the basis of their earlier work [31], which studied the prediction of maximal aerobic power output in healthy individual and cardiac patients on β-blockers, Eston et al. [30] proposed a production procedure to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) in healthy adults. This procedure involves asking the individual to self-regulate (i.e., produce) and maintain a series of exercise intensities (for 2, 3 or 4 min) corresponding to 3, 4 or 5 pre-set overall RPE levels (i.e., RPE9, RPE11, RPE13, usually RPE15 and sometimes RPE17). These RPE levels may be prescribed in an incremental [30, 3240] or randomized [41] fashion (Fig. 1). In other words, during each stage of a production procedure, the individual must produce and maintain an exercise intensity corresponding to a pre-set or clamped RPE level according to sensations emanating from the whole body. This procedure is opposed to the ‘estimation’ procedure which requires the individual to provide an RPE in response to changes in exercise intensity. A probable advantage of the production procedure (compared with the estimation procedure) is that the individuals are required to focus very strongly on internal signals. This may improve the relationship between RPE and \( \dot{V}{\text{O}}_{2} \), thus leading to a more accurate prediction of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) from the extrapolation of RPE:\( \dot{V}{\text{O}}_{2} \) [6, 14]. In their study, Eston et al. [30] recruited 10 physically active men, who performed four exercise tests on a cycle ergometer: one graded exercise test until exhaustion to determine actual \( \dot{V}{\text{O}}_{2\;\hbox{max} } \), and three same submaximal incremental exercise tests based on a production procedure (termed ‘perceptually regulated exercise test’: PRET) to estimate \( \dot{V}{\text{O}}_{2\;\hbox{max} } \). During PRET, the stages lasted for 4 min and corresponded to RPE9, RPE11, RPE13, RPE15 and RPE17. The RPE:\( \dot{V}{\text{O}}_{2} \) relationship was assessed across three RPE ranges (i.e., RPE9–15, RPE9–17, and RPE11–17), and extrapolated to the theoretical \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) at the maximal RPE (i.e., RPE20). The results showed no significant difference between actual (48.8 ± 7.1 mL·kg−1·min−1) and predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \), whatever PRET, when RPE9–17 and RPE11–17 were used (e.g., 47.3 ± 9.6 and 49.7 ± 8.7 mL·kg−1·min−1, respectively, during the initial PRET). However, significant differences between actual and predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) were observed when \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) was predicted from RPE9–15 from both the initial and second PRET (e.g., 43.4 ± 10.6 and 44.4 ± 8.9 mL·kg−1·min−1, respectively). Moreover, the 95 % limits of agreement (LoA) were lower from RPE9–17, suggesting higher accuracy of predictions (1.5 ± 7.3, 0.2 ± 4.9 and −1.2 ± 5.8 mL·kg−1·min−1 for the first, second and third PRET, respectively). Wider 95 % LoA (5.4 ± 11.3, 4.4 ± 8.7 and 2.3 ± 8.4 mL·kg−1·min−1 for the first, second and third PRET, respectively) and lower intra-class correlation coefficients (ICC = 0.89–0.91) were observed from RPE9–15, suggesting lower accuracy and reliability in predictions when this RPE range was used. Since then, several studies have supported the predictive validity of the RPE:\( \dot{V}{\text{O}}_{2} \) relationship from RPE9–15 to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) [32, 35, 38], but these studies are debated [34, 39]. Indeed, sometimes lower 95 % LoA have been reported in an initial PRET compared with a second PRET [39]. Consequently, further confirmation is needed to ascertain if the RPE:\( \dot{V}{\text{O}}_{2} \) from RPE9–15 range can be used to accurately predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \), particularly as it is recommended to stop an exercise test at RPE15 to avoid cardiovascular complications [22].

Fig. 1
figure 1

Prediction of maximal oxygen uptake (\( \dot{V}{\text{O}}_{2\;\hbox{max} } \)) from individual linear regression between the overall ratings of perceived exertion (RPE) obtained from Borg’s 6–20 RPE scale and oxygen uptake (\( \dot{V}{\text{O}}_{2} \)) during a production procedure. a Example of \( \dot{V}{\text{O}}_{2} \) during the different RPE levels of a perceptually regulated exercise test (PRET). b Example of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) prediction using the individual linear regression between RPE9–15 and \( \dot{V}{\text{O}}_{2} \) measured during the different RPE levels of a PRET. The dotted line represents the extrapolation of the individual linear regression between RPE and \( \dot{V}{\text{O}}_{2} \) to the theoretical \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) (i.e., 43.4 mL·kg−1·min−1) from maximal theoretical RPE (i.e., RPE20)

Although Eston et al. [30] observed significant differences between actual and predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) using RPE9–15 during both initial PRETs, the bias and 95 % LoA were lower during the second PRET, and not significantly biased during the third PRET. These results suggest that prediction accuracy is improved with protocol familiarity [30], which has been confirmed by several authors [32, 38]. However, further clarity is needed regarding whether \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) can be accurately predicted from an initial PRET, or whether it is necessary to conduct a second PRET.

In theory, the exercise endpoint (i.e., exhaustion or maximal effort) corresponds to RPE20 on Borg’s scale [22]. However, several authors [42, 43] have observed lower RPE at the exercise endpoint (≈ RPE19), which leads some researchers to suggest a physical and/or perceptual reserve capacity, presumably to maintain homeostasis and protect the individual from physical damage [4446]. Hence, Faulkner et al. [37] compared the accuracy of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) predicted from the RPE:\( \dot{V}{\text{O}}_{2} \) extrapolated to RPE19 and RPE20 during PRET. Although the predictions obtained from RPE9–15 extrapolated to RPE20 were not significantly different (i.e., 41.9 ± 10.8, 43.0 ± 11.8 and 43.6 ± 11.7 mL·kg−1·min−1, respectively, for first, second and third PRET) to actual \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) (i.e., 42.7 ± 10.6 mL·kg−1·min−1), those extrapolated to RPE19 were significantly lower than actual \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) during both initial PRETs (39.3 ± 10.0 and 40.5 ± 10.9 mL·kg−1·min−1, respectively), suggesting the use of RPE20 as the theoretical end point when RPE9–15 is used. However, in a later study, Eston et al. [33] recommended the RPE:\( \dot{V}{\text{O}}_{2} \) obtained from RPE9–15 should be extrapolated to RPE19 rather than RPE20 to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) with greater accuracy. Consequently, consensus around the perceptual endpoint used for the extrapolation (i.e., RPE19 and RPE20) remains equivocal.

Given the unresolved questions concerning the use of the submaximal RPE:\( \dot{V}{\text{O}}_{2} \) relationship to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \), the purpose of the current meta-analysis was to examine the validity of the method of predicting \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) from the individual linear regression between RPE obtained from Borg’s scale and \( \dot{V}{\text{O}}_{2} \) during a PRET, and to determine the level of agreement and accuracy of predicting \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) from an initial PRET and retest using RPE19 and RPE20, in order to identify the best of the methods.

2 Method

Figure 2 (over page) presents the study selection flow diagram.

Fig. 2
figure 2

Study selection flow diagram. PRET perceptually regulated exercise test, RPE ratings of perceived exertion, \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) maximal oxygen uptake

2.1 Identification

The current meta-analysis was conducted in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement guidelines [47]. A systematic search of the research literature was conducted for studies which assessed the prediction of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) from the RPE. This systematic search was undertaken from the inspection of PubMed–NCBI, EBSCO Host and Scopus databases. The searches in these electronic databases were conducted by one author (JC; July 15, 2014) from keywords identified according to all co-authors (i.e., oxygen AND perceived exertion OR rpe AND predict OR estimation OR estimate). This revealed 512 manuscripts. Once duplicate citations were removed (n = 264), 248 articles were analyzed.

2.2 Screening

Following the initial selection of studies, two experts in the field (JC and MT) performed the eligibility assessment (for each manuscript) independently in a blinded standardized manner by screening the titles and abstracts. Disagreements were discussed between three authors (JC, MT and CT) and resolved by consensus.

To be considered eligible, the manuscript had to be published in a peer-reviewed journal and the prediction of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) must have been obtained from individual linear regression between the RPE and \( \dot{V}{\text{O}}_{2} \). As the current meta-analysis is interested in the prediction of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \), a submaximal exercise test must also have been used. This exercise test could involve any mode of exercise (e.g., cycling, running, cranking…) in a laboratory or in the field, in order to test the validity of individual linear regressions between the RPE and \( \dot{V}{\text{O}}_{2} \) to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) in various conditions. Only Borg’s RPE scale must have been used because this scale is the most frequently used to measure effort perception. In order to compare the accuracy of predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \), the extrapolation could be obtained from RPE19 or RPE20. No restriction was placed on the participants’ status (e.g., women vs men, healthy participants vs patients, sedentary vs active individuals, young vs old subjects, etc.), in order to examine the predictive validity in everybody.

After the titles and abstracts of studies were reviewed, 225 articles were further removed from the analysis.

2.3 Eligibility

Twenty-three full-text articles were evaluated. Moreover, based on the information within these full-text articles, two authors (JC and CT) used a standardized form to select the manuscripts eligible for inclusion in the meta-analysis. This form ascertained whether information provided in the title and abstract matched the text of the manuscript and the inclusion criteria for the meta-analysis (from this check, one further article was removed) [48]. Furthermore, the reviewers verified that the exercise tests were performed using a ‘production protocol’, and a further 12 articles using an ‘estimation protocol’ were removed [6, 14, 28, 4957].

The production procedure could be in the form of an incremental or randomized test, and all stage durations were included (e.g., 2, 3 or 4 min). In addition, the reference lists from 23 full-text articles were searched manually to identify other possible eligible manuscripts (JC and RE). From this analysis, one manuscript was identified [40], which was not published in a peer-reviewed journal. Consequently, the meta-analysis included 10 studies [30, 3239, 41].

2.4 Inclusion

From the 10 included articles, we (JC and MT) extracted the following information: study identification (authors, year of publication, title), number of participants, participants’ status, exercise modality, actual and predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) (mean ± standard deviation: SD). For the predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \), we focused on the linear regression between RPE and \( \dot{V}{\text{O}}_{2} \) including all values up to and including RPE15 to have a maximum of values (and to be accurate), whilst reducing the risk of cardiovascular complications as indicated by Borg [22]. Moreover, only the overall RPE were used, as overall RPE provide more accurate estimates of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) than when individual linear regression between peripheral RPE and \( \dot{V}{\text{O}}_{2} \) are used to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) [54]. Where possible, means ± SD of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) predicted from extrapolation to RPE19 and RPE20 were collected. Furthermore, these data were collected in the two initial exercise tests (when at least two exercise tests were performed). Similarly, to avoid a possible effect of familiarization, only the baseline data could be included. As most studies using PRET involve a series of incremental stages for 3 or 4 min, we collected the predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) from these stage durations (when possible). Data in the whole population, rather than sub-groups, was preferred because the main aim of study was to test the validity of the linear regression between RPE and \( \dot{V}{\text{O}}_{2} \) during a production procedure to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) in all participants. If necessary, we contacted primary authors by email for further information about unpublished and unclear data (and more especially for pooled SD). One author (RE) then independently checked the extracted data for errors against the manuscripts.

2.5 Risk of Bias and Quality Appraisal in Included Studies

As a consensus quality assessment in observational studies is unlikely [58], we appraised the methodological rigor of the studies included in the meta-analysis with the Quality Assessment Tool for Quantitative Studies [59], modified according to Evans et al. [8]. More specifically, we considered the following four components of the tool, which were relevant to this meta-analysis: (1) selection bias, (2) study design, (3) data collection methods, and (4) withdrawals and dropouts. Two authors (JC and MT) independently appraised all included studies against each of the four components. To minimize bias in the interpretation of this tool, both reviewers (JC and MT) initially assessed a small sample of studies eligible for inclusion in the meta-analysis (but not included). Disparities in risk of bias judgements were reviewed and discussed prior to evaluating any of the included studies. Disagreements during the quality assessments were discussed between three authors (JC, MT and CT) and resolved by consensus. The components were individually rated as ‘weak,’ ‘moderate’ or ‘strong,’ based on the standard criteria [60]. If necessary, we contacted primary authors again by email for further information about unpublished and unclear data. A global rating for each study was then obtained based on the total number of weak ratings that were accumulated (two or more weak ratings = ‘weak’, one weak rating = ‘moderate’, zero weak ratings = ‘strong’), as proposed by Evans et al. [8]. No studies were excluded on the basis of risk of bias.

2.6 Statistical Analysis

Statistical analyses were performed (by MT and AF) using StatsDirect Software (StatsDirect Ltd, Cheshire, UK). Descriptive data are reported as mean ± SD.

In order to compare actual and predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \), four subgroup outcomes (i.e., \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) predicted from RPE19 during the initial PRET, \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) predicted from RPE19 during the second PRET, \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) predicted from RPE20 during the initial PRET, and \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) predicted from RPE20 during the second PRET) were identified. Bias (i.e., mean differences) for subgroup outcomes were extracted from the published papers and, if not available, calculated by simply subtracting the actual \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) from the predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \). Moreover, when not provided, standard deviation of the change (SDchange) was extracted from LoA if provided in the included papers (i.e., SDchange = LoA ÷ 1.96). LoA were used according to our aim and based on recommendations described before [61]. Alternatively, SDchange was imputed using a correlation coefficient provided in the manuscripts using the recommended formula suggested in the handbook of Cochrane, as presented below:

$$ {\text{SD}}_{\text{change}} = \, \sqrt {\left( {\mathop {\text{SD}}\nolimits_{{{\text{actual}}\,\dot{V}{\text{O}}_{2\;\hbox{max} } }}^{2} + \mathop {\text{SD}}\nolimits_{{{\text{predicted}}\,\dot{V}{\text{O}}_{2\;\hbox{max} } }}^{2} - \left( {2 \times \mathop {\text{SD}}\nolimits_{{{\text{actual}}\,\dot{V}{\text{O}}_{2\;\hbox{max} } }}^{2} \times \mathop {\text{SD}}\nolimits_{{{\text{predicted}}\,\dot{V}{\text{O}}_{2\;\hbox{max} } }}^{2} \times r} \right)} \right)} $$

where r is the correlation coefficient between actual and predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \).

Given the large difference in study sample sizes (between 10 and 75 participants, respectively, for Eston et al. [30] and [33]), studied populations (i.e., young, old, active, sedentary, healthy, pathological women and men) and PRET modalities (i.e., cycling, running, and arm cranking), we used random-effects models to combine all subgroup outcomes. These random-effects models permitted more conservative estimates of the combined effect. Each effect size was weighted by the inverse of its variance. The results are reported as weighted means and LoA.

A significant difference was indicated when p ≤ 0.05.

3 Results

After exclusion of duplicate citations, a total of 248 studies was identified from preliminary searches (Fig. 2). From the titles and abstracts, 225 obviously irrelevant studies were excluded. From the 23 full-text manuscripts reviewed, a total of 10 studies met the eligibility criteria and were included in the final data analysis (Table 1).

Table 1 Extracted information from 10 studies included in the meta-analysis

Sample sizes of the included studies ranged between 10 [30] and 75 [33]. All included studies were most recent, the oldest study being published in 2005 [30], with the most recent articles published in 2014 [39, 62].

Figure 3 shows the bias and LoA of the actual and the predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) for RPE19 during initial PRET and retest, as well as for RPE20 during initial test and retest.

Fig. 3
figure 3

Summary meta-analysis plot of comparison between actual and predicted maximal oxygen uptake (\( \dot{V}{\text{O}}_{2\;\hbox{max} } \)) in four subgroup outcomes according to test vs retest and the maximal rating of perceived exertion (RPE) used to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) (i.e., RPE19 vs RPE20)

For determining the difference between the actual versus predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) on the first PRET using RPE19, data could be extracted from five studies. Using a random-effects model, the combined difference between the actual versus predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) was 1.56 (95 % LoA between −3.73 and 6.86).

For determining the difference between the actual versus predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) on the retest from RPE19, data were available from four studies. Using a random-effects model, the combined difference between the actual versus predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) was 1.95 (95 % LoA between −3.64 and 7.55).

More studies had used the RPE20, and we were able to extract data from 10 studies to compare the actual versus predicted difference in \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) on the first PRET. The combined mean difference between actual and predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) was 2.42 using a random-effects model (95 % LoA between −1.19 and 6.04).

For determining the difference between the actual versus predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) on the retest from RPE20, data were available from nine studies. The combined mean difference was 1.89 (95 % LoA between −1.10 and 4.89).

Quality assessment for all studies apart from one [36] was globally considered as ‘weak’ (Fig. 4). Indeed, only the study by Evans et al. [36] was rated as ‘moderate’. According to a recent systematic review [8], to receive a global rating of ‘moderate’, only one from the four components could be considered as ‘weak’. If more than one component was rated as ‘weak’, the global quality of the study was rated as ‘weak’. According to the Effective Public Health Practice Project [60], often the selection bias was ‘weak’ because the participants were recruited from a structure (e.g., in a university) in a self-referred manner. With regard to the data collection methods component, all studies were rated as ‘weak’ predominantly due to a failure to report on the validity and reliability of the data collection tools [60]. Ninety percent of studies (n = 9) were rated as ‘weak’ in the study design component because they did not include a randomized controlled trial, controlled clinical trial, cohort study, or even an interrupted time series (Fig. 4). Only one study which included a cohort analytic design was considered as ‘moderate’ [36]. All studies were rated as ‘strong’ in relation to the quality of withdrawals and drop-outs component. The agreement between reviewers was equal to 90 %, with the primary reason for disagreements in scoring being oversight.

Fig. 4
figure 4

Quality assessment of the studies included in the meta-analysis (n = 10)

4 Discussion

The 10 studies included in this meta-analysis were all published in the last 10 years, suggesting the prediction of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) from the relationship of RPE:\( \dot{V}{\text{O}}_{2} \) during a PRET is of current scientific interest. Moreover, the data presented in these recent studies were often heterogeneous. This heterogeneity may be explained from the diversity of recruited populations (i.e., young, old, active, sedentary, healthy, and clinical populations), as well as the various PRET modalities (i.e., cycling, walking/running, and arm cranking). For example, the subgroup outcome ‘\( \dot{V}{\text{O}}_{2\;\hbox{max} } \) predicted from RPE19 during the second PRET’ included a lower number of studies (n = 4) and incorporated the study with second lowest actual \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) [39] and the study with the highest actual \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) [38] of the 10 included studies. Specifically, Smith et al. [39] included older individuals with a low \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) (approximately 30 mL·min−1·kg−1) while Morris et al. [38] recruited young active and healthy participants with significantly higher \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) (approximately 50 mL·min−1·kg−1).

The current study shows that \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) may be predicted from the linear relationship between overall RPE≤15 and \( \dot{V}{\text{O}}_{2} \) in a large population during various PRET modalities, regardless of the prediction method (with bias <3 %). This implies that following medical approval for a participant to engage in a training program, it is not necessary to perform maximal graded exercise tests to readjust the exercise intensity (i.e., percentage of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \)), or check the cardiorespiratory fitness improvement after the training program (i.e., increase of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \)). Indeed, an obvious advantage of predicting \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) from a submaximal exercise test, rather than measuring \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) from graded exercise test to exhaustion, is that if the maximal test is badly scheduled into a training program, this may lead to inadequate recovery and decrease the individual’s performance in subsequent days [6, 7]. In elderly individuals and patients, avoidance of maximal physical exercise provides greater protection against adverse events, which occur more frequently during high and maximal exercise intensities (e.g., arrhythmia, myocardial infarction or even death). In addition to the safety risk, such high and maximal exercise intensities may also induce negative affect [63, 64], which is believed to be a critical factor for future exercise behaviour [65, 66]. In these circumstances, a predictive method to determine \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) is preferable to avoid the negative consequences of an adverse event and to encourage the participant’s adherence to an exercise program.

Recently, Mauger et al. [67, 68] proposed a maximal PRET to measure \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) (while the previous studies proposed a submaximal PRET), termed a ‘self-paced velocity test’ (SPV), on the basis that it provides a higher value of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) compared with standard graded exercise tests. The SPV includes 5 × 2-min continuous stages in which the participant regulates intensity according to RPE clamped at RPE11, RPE13, RPE15, RPE17, and RPE20. Regardless of the veracity of Mauger et al.’s observations about the magnitude of the \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) value from the SPV [6975], the penultimate stage at RPE17 elicits an intensity of 90–95 % \( \dot{V}{\text{O}}_{2\;\hbox{max} } \), followed by a 2-min supra-maximal sprint at RPE20. The SPV therefore has the same disadvantages of a graded exercise test to volitional exhaustion as indicated above, with the exception that the individual has greater control of the exercise test. In a study involving a similar maximal PRET, Evans et al. [62] reported a higher affective state at all equivalent RPE stages compared with an ‘experimenter-controlled’ incremental ramp test in active individuals. They concluded that a maximal PRET may be applied in situations where the direct measurement of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) is desirable and the affective responses of the individual are considered to be important. Nevertheless, it is important to note that the theoretical maximal RPE (i.e., RPE20) is rarely reported as a mean value in studies in healthy individuals [6, 42, 56, 76] or in older participants [28, 39].

Previously, authors [30, 32, 34, 35, 37, 41] have compared \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) predicted from the linear relationship between RPE and \( \dot{V}{\text{O}}_{2} \) using two RPE ranges (i.e., RPE≤15 vs RPE≤17). The results of these studies show that the larger RPE range (i.e., RPE≤17) permits greater accuracy of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) prediction. This result is not surprising as more values in the RPE:\( \dot{V}{\text{O}}_{2} \) range are included and evidence indicates that the intensity elicited at RPE17 corresponds to ≥90 % \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) (as indicated previously). However, in the current meta-analysis, we include only the predictions obtained from RPE≤15 as this RPE range is more appropriate for sedentary and clinical populations. Indeed, it offers a compromise between the negative affect and potentially greater risk of cardiovascular complications associated with high exercise intensities during PRET, and the gain in predictive accuracy using the large RPE range (i.e., RPE≤17). Furthermore, in comparison with RPE≤17, RPE≤15 reduces the duration and overall cost of using submaximal protocols.

The \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) predicted from the extrapolation of the line of best fit of RPE:\( \dot{V}{\text{O}}_{2} \) to RPE20 from a second PRET revealed narrower 95 % LoA in comparison with other subgroup outcomes (95 % LoA between −1.10 and 4.89; Fig. 3), suggesting that this prediction method may be used to estimate \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) with higher accuracy. Although this has not been confirmed in recent studies [33, 39], it is not surprising because several authors have clearly reported lower 95 % LoA between actual and predicted \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) during a second PRET [30, 38, 52]. According to these authors, the lower 95 % LoA may be explained by familiarity with production procedure [77, 78]. Indeed, it is possible that the learning effect enables participants to regulate the sequential bouts of exercise intensities more accurately according to effort perception. The results of the current study also suggest this.

Previously, some studies have showed that the accuracy of predicting \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) from RPE:\( \dot{V}{\text{O}}_{2} \) extrapolated to RPE19 during PRET is better in comparison with RPE20 [33, 39]. Although RPE20 provides the theoretical exercise endpoint (i.e., exhaustion or maximal effort), a lower RPE (often approximately RPE19) is more frequently observed [6, 42, 43, 76]. This submaximal RPE leads some researchers to suggest a physical and/or perceptual reserve capacity, presumably to maintain homeostasis and protect the individual from physical damage [4446]. On the other hand, during exercise, it has previously been suggested that the brain increases RPE proportionally to the percentage of time remaining to completion, and that the time to exhaustion corresponds to the tolerated maximal RPE (e.g., RPE19 rather than RPE20) [42, 79, 80]. Consequently, it is possible theoretically that the RPE:\( \dot{V}{\text{O}}_{2} \) extrapolated to RPE19 during PRET provides a better prediction of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) compared with RPE20. However, according to other studies [38], the current meta-analysis suggests that, as evidenced by the smaller 95 % LoA from RPE20 during the second PRET (Fig. 3), this perceptual reserve capacity is probably weak (i.e., inferior to 1 rating), at least in the healthy population. Consequently, it is generally preferable to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) from RPE:\( \dot{V}{\text{O}}_{2} \) extrapolated to RPE20. However, in some specific populations (e.g., older individuals or patients) the extrapolation to RPE19 may be better. For example, Ainsworth et al. [28] reported substantially lower mean peak RPE (approximately RPE17) at the end of the laboratory-based reference maximal exercise test.

The present meta-analysis suggests that, generally, the relationship of RPE:\( \dot{V}{\text{O}}_{2} \) during initial or second PRET extrapolated to RPE19 or RPE20 permits the prediction of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) in a large population (i.e., healthy or paraplegic, sedentary or active, women or men, young or old adults). However, although bias and LoA are low regardless of the prediction method (bias <3 %), in some specific populations such as very well trained and elite athletes, the range of error may be considered as too large to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) with accuracy, as previously indicated [29]. Moreover, it remains to be confirmed if the accuracy of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) prediction from PRET may be adapted in some specific populations (e.g., young children and older individuals with cognitive disease). Furthermore, it would be interesting to compare the \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) predicted from the linear relationship between overall RPE≤15 and \( \dot{V}{\text{O}}_{2} \) during PRET with \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) prediction from the graded exercise test in order to identify clearly the best method to predict \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) (production vs estimation procedure).

With the exception of the study by Evans et al. [36], considered to be of ‘moderate’ quality, all other studies were rated as globally ‘weak’ (Fig. 4). This was largely due to poor reporting of selection bias (e.g., the participants were recruited from a university in a self-referred manner) and data collection method (e.g., failure to report the validity and/or reliability of the data collection tools). These results suggest that future studies on the prediction of \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) from RPE:\( \dot{V}{\text{O}}_{2} \) during PRET should provide further information, specifically on validity and reliability of the measurement tools.

5 Conclusion

This meta-analysis provides evidence that \( \dot{V}{\text{O}}_{2\;\hbox{max} } \) may be predicted from the linear relationship between overall RPE≤15 and \( \dot{V}{\text{O}}_{2} \) during PRET in different populations (i.e., young, old, active, sedentary, healthy, and some clinical populations), and in various PRET modalities (i.e., cycling, running, and arm cranking). To obtain greater accuracy of predictions, extrapolation of the RPE:\( \dot{V}{\text{O}}_{2} \) to RPE20 (rather than the tolerated maximal RPE: RPE19) is recommended in healthy populations, and especially in very well trained and elite athletes.