FormalPara Key Points

Agility tests are generally considered a reliable and valid method of assessing the perceptual and physical components of agility.

Decision-making and perceptual factors are often heralded as being key factors to distinguish between standard of playing ability. However, the mediating factors remain relatively unknown. The contribution of strength is unclear.

Larger improvements in performance are likely to be made with an intervention that includes both a physical and a cognitive stimulus.

1 Introduction

Team sports are characterized as being intermittent in nature, whereby players are required to frequently transition between brief bouts of high-intensity running and longer periods of low-intensity activity [13]. In addition, players may perform movements such as tackling, blocking, jumping, and directional changes integrated alongside technical skills. Despite success being influenced by a myriad of factors, it is clear that athletes should possess physical, technical, and tactical proficiency for their sport [4]. Physicality has gained much interest in the literature, particularly as the demands of team sports seem greater than in previous years [5, 6]. Agility is heralded as an important quality required by team sports athletes [710]. Anecdotally, the ability to make calculated decisions and maneuver into position seems to be characteristic of some of the world’s best team sport athletes. In 2002, Young et al. [11] delineated several physical and cognitive components of agility. Although disparity may exist, agility is broadly defined as a rapid whole-body movement with change of velocity or direction in response to a stimulus [12]. Implicit in this definition is that agility comprises a perceptual decision-making process and its outcome, a change of direction (COD) or velocity [12]. In view of this definition, agility has been sub-categorized into COD ability and reactive agility, although this may not always be transparent in the literature. COD ability can be described as a movement where no immediate reaction to a stimulus is required and is considered pre-planned in nature [12]. The phrase ‘reactive agility’ has traditionally been used in the literature to encapsulate a movement in response to a stimulus. However, Young et al. [13] recently postulated that the word ‘reactive’, according to the current definition of agility, is redundant. Consequently, we use the word ‘agility’ solely to define a perceptual decision-making process in response to a stimulus. Despite its importance being identified nearly 4 decades ago [14], our understanding of agility remains somewhat limited, particularly compared with other physical characteristics such as endurance, strength/power, and speed. However, there has been a rapid increase in the number of studies published with relevance to agility, particularly testing and training. Given the increasing recognition of the importance of agility, it would be valuable to establish whether current agility tests possess appropriate test reliability and validity. Furthermore, providing details about factors that may impact agility performance and how these can be improved with different intervention strategies will guide practitioners to appropriate training design and prescription. Therefore, the aim of this review was to (1) detail the reliability and validity of current agility tests, (2) identify the possible factors affecting agility performance, and (3) provide an overview of current intervention strategies used to improve agility performance.

2 Methods

2.1 Literature Search

A systematic review of all published literature was undertaken in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [15]. One researcher (DJP) independently searched the PubMed, Google Scholar, SPORTDiscus, Science Direct, and Web of Science electronic databases from September 2014 until February 2015. The search period of publication dates ranged from 2005 to February 2015. The following keywords were used to capture reliability: ‘reliability’, ‘repeatability’, ‘reproducibility’, ‘measurement error’, ‘consistency’, ‘smallest worthwhile change’, and ‘minimal detectable change’. The following keywords were used to capture validity: ‘validity’, ‘construct’, ‘convergent’, ‘discrimination’, ‘match performance’, ‘physical fitness’, ‘fitness test’, ‘gold standard’, ‘level’, and ‘standard’. The following keywords were used in different combinations: ‘agility’, ‘reactive’, ‘unplanned’, ‘unanticipated’, ‘test’, ‘training’, ‘fitness’, ‘physical’, ‘cognitive’, ‘perceptual factors’, ‘cutting’, ‘manoeuvre’, ‘response’, ‘team’, ‘sports’, ‘soccer’, ‘football’, ‘rugby’, ‘basketball’, ‘Australian Rules football’, ‘netball’, ‘expert’, and ‘novice’. A ‘reactive’ task is synonymous with unplanned and unanticipated, while a ‘change of direction’ task is synonymous with planned and anticipated. Although no restrictions were made on the study design, eligibility criteria for study inclusion consisted of one of the following: (1) tests comparing results on two separate occasions under similar conditions (reliability), (2) comparison between different levels or playing ability (validity), (3) examined factors that may affect agility performance, and/or (4) examining the effect of an intervention on agility performance. An agility test was classified as a whole body change in velocity and/or direction in response to a light, video, or human stimulus. DJP coded the studies according to the selection criteria. Reference lists of retrieved full-text articles and recent reviews were examined to identify additional articles not found by our search. Only full-text sources were included so that methodology detail could be assessed; therefore, abstracts and conference papers from annual meetings were not included in the analysis.

2.2 Literature Selection

A review was carried out on the selection of studies in two consecutive screening phases. Phase one consisted of screening for (1) duplicates, (2) title, and (3) abstract. The second phase involved screening the full paper using the inclusion criteria. Studies were included if they fulfilled the following selection criteria: (1) written in English, (2) published in peer-reviewed journals, (3) used an agility test whereby participants performed a COD and/or velocity in response to a cognitive stimulus, and (4) participants were actively involved in team sports. Where applicable and to support a point being made, reference was made to COD or perceptual/decision-making factors independently.

2.3 Data Extraction and Analyses

Extracted data from each source document included study identification information, number of participants, demographic information (including the sex, age, stimulus, and standard of play), sporting discipline, reliability values, measure of performance, magnitude of training intervention, effect size, comparison between groups, and the information required to assess the quality of each study.

2.4 Assessment of Methodological Quality

Following the article search and examination, full-text articles were retrieved and a methodological quality assessment performed. The scale used to assess training interventions was adopted from a modified quality-assessment screening scoring system [16]. This is a ten-item scale (range 0–20) designed for rating the methodological quality of exercise training studies. The items are as follows:

  1. 1.

    Inclusion criteria were clearly stated;

  2. 2.

    Subjects were randomly allocated to groups;

  3. 3.

    Intervention was clearly defined;

  4. 4.

    Groups were tested for similarity at baseline;

  5. 5.

    A control group was used;

  6. 6.

    Outcome variables were clearly defined;

  7. 7.

    Assessments were practically useful;

  8. 8.

    Duration of intervention was practically useful;

  9. 9.

    Between-group statistical analysis was appropriate;

  10. 10.

    Point measures of variability.

The score for each criterion were as follows: 0 = clearly no; 1 = maybe; and 2 = clearly yes. The rationale for using the modified assessment scoring system was that previous articles using commonly applied scales—(1) the Delphi scale; (2) the PEDro scale; or (3) the Cochrane scale—may not fully represent the methodological quality of experimental research for training intervention studies.

3 Results

3.1 Search Results

The initial search procedure yielded 1827 records through the electronic databases (Fig. 1). After removing duplicates, 861 publications were retained for the article selection process. Title and abstract selection excluded 238 and 567 records, respectively. The remaining 56 records were further examined using the specified inclusion/exclusion criterion, and 14 records were rejected, leaving 42 studies to directly examine the reliability (Table 1) and validity (Table 2) of agility tests as well as factors affecting agility performance (Table 3), and intervention studies (Table 4).

Fig. 1
figure 1

Flowchart of the selection process for inclusion of articles in the systematic review

Table 1 Study characteristics regarding the reliability of agility tests
Table 2 Study characteristics regarding the validity of agility tests
Table 3 The relationship between agility and other measures
Table 4 Study characteristics regarding the effectiveness of interventions on agility performance

3.2 Methodological Quality Assessment

Nine studies examined the effects of an intervention on agility performance, yielding a mean score of 14/20 (range 13–17). Most studies provided detailed and repeatable descriptions of methods, clearly defined outcome variables, and used appropriate statistical analyses. Some studies did not include an inclusion/exclusion criterion and/or a control group, nor was test–retest reliability presented in the studied sample.

3.3 Study Characteristics

3.3.1 Reliability

A total of 21 studies detailed the reliability of an agility test (Table 1). In total, 644 participants (median 30, maximum 66, minimum 12) were studied. Participant age ranged from 16 to 37 years (median 21.4 years), and the classification of playing ability varied from amateur to elite national league level. The studies included solely males (n = 16), solely females (n = 3), and both males and females (n = 2). Team sports included basketball (n = 7), Australian Rules football (ARF) (n = 4), rugby league (n = 4), rugby union (n = 1), netball (n = 1), softball (n = 1), soccer and futsal (n = 1), and mixed sports (n = 2). The distribution of stimulus was light (n = 6), video (n = 5), and human stimuli (n = 10). Three of these studies included more than one stimulus. Furthermore, in some instances, studies included more than one parameter, for example, detailing the reliability of the test as well as the differences between playing ability.

3.3.2 Validity

A total of 16 studies examined the differences between playing level, as an indicator of validity (Table 2). In total, 525 participants (median 30, maximum 86, minimum 12) were studied. Participant age ranged from 16 to 28 years (median 22.0 years), and the classification of standard varied from amateur to elite national league level. The studies included solely males (n = 11), solely females (n = 4), and both males and females (n = 1). Team sports included basketball (n = 3), ARF (n = 4), rugby league (n = 3), rugby union (n = 1), netball (n = 2), softball (n = 1), soccer and futsal (n = 1), and mixed sports (n = 1). The distribution of stimulus was light (n = 4), video (n = 5), and human stimuli (n = 6). One study included more than one stimulus.

3.3.3 Factors Influencing Agility

Six studies examined the relationship between agility and other performance indices (Table 3). In total, 124 participants (median 19, maximum 30, minimum 8) were studied. Participant age ranged from 11 to 24 years (median 21.0 years), and the classification of standard varied from university students to national and international level. The studies included solely males (n = 5) and solely females (n = 1). Team sports included basketball (n = 1), ARF (n = 2), rugby union (n = 1), soccer (n = 1), and mixed team sports (n = 1). The distribution of stimulus was light (n = 2), video (n = 3), and human stimuli (n = 1).

3.3.4 Influence of Training on Agility

Nine studies examined the efficacy of an intervention on agility performance (Table 4). In total, 150 participants (median 15, maximum 36, minimum 8) were assessed. Participant age ranged from 14 to 23 years (median 18.5 years), and the classification of playing ability varied from amateur to elite national league level. The studies included solely males (n = 7), solely females (n = 1), and both males and females (n = 1). Participants were involved in soccer (n = 3), ARF (n = 2), rugby league (n = 1), basketball (n = 1), netball (n = 1), and mixed sports (n = 1). The distribution of stimulus was light (n = 2), video (n = 3), and human stimuli (n = 4).

3.4 Study Findings

Intraclass correlation coefficient (ICC) values were 0.80–0.91, 0.10–0.81, and 0.81–0.99 for test time using light, video, and human stimuli, respectively (Table 1). ICC values for decision-making time, decision accuracy, pattern recall and recognition and confidence rating were 0.95–0.99, 0.74–0.93, 0.31–0.85, and 0.50 (Table 1). Human and two-dimensional (2D) stimuli demonstrated the highest level of discriminant validity. On average, higher skilled individuals were 7.5 % (maximum 22.9 %, minimum 2.9 %) faster than their lesser skilled counterparts for the total time to complete an agility test (Table 2). From the studies conducted, reaction time and accuracy, foot placement patterns, and certain functional movements (i.e., in-line lunge) were shown to be related to agility performance. The contribution of strength remains unclear (Table 3). The average training intervention period lasted for 5.3 weeks (range 3–7). Improvement in time to complete the agility test ranged from 1.0 % (vibration training) to 7.5 % (small-sided games) (Table 4).

4 Discussion

4.1 Testing

4.1.1 Light Stimulus

To test agility, the assessment task must include an introduced stimulus [17]. Since the work by Chelladurai et al. [18], advances in technology have led to commercial timing gate systems (e.g., SMARTSPEED™, Fusion Sport, Sumner Park, QLD, Australia) being made more accessible in sporting and research environments [19].

One particular benefit of using a light stimulus is that the signal can be programmed to appear at the same time on each occasion. Providing such consistency should have the potential to provide greater levels of repeatability. However, the number of studies reporting the reliability of a light stimulus is similar to a video and less than a human stimulus. From those studies that have, high reliability has been shown across different sports, for different playing ability and for both males and females (Table 1). In 2011, Green et al. [20] examined the reliability of a field test protocol of agility (light stimulus), as well as COD ability and linear speed in academy (high-performance group) and club (low-performance group) rugby union level players. Test–retest data revealed an ICC value of 0.88 for the agility test. However, this was for the low- (club players) and not the high-performance (academy) group. Establishing whether the high-performance group can demonstrate even better reliability scores would have been of interest.

The majority of studies examining the reliability of an agility test have been conducted in field-based team sports. Given that agility is context specific, Scanlan et al. [21] sought to examine the reliability of an agility test using a light stimulus in male court-based (basketball) players. The test–retest trials demonstrated the (light stimulus) agility test to possess high reliability (ICC 0.81–0.91). However, participants were tasked with completing multiple agility test trials in a randomized fashion using both generic and sport-specific stimuli. It is possible that performing same-day test–retest correlation may not account for both errors of measurement and temporal instability and may denote that the second assessment may not actually be independent of the first. This should be a consideration in future reliability studies.

It is clear the majority of studies have used a ‘Y-shaped’ design to assess agility performance. However, it is unlikely that this offers an appropriate approach for distinctly different sports [22]. For that reason, Sekulic et al. [22] used a ‘stop-n-go’ (SNG) test to assess agility in college-aged participants from a range of sports. The difference between the SNG test and that of the commonly used ‘Y-shaped’ course is that the latter consists of non-stop running. From the results, the ICC score was shown to be high for both males (ICC 0.81) and females (ICC 0.86). That the SNG agility ICC scores were comparable to those of the COD (ICC 0.87) and 10-m sprint (0.88) demonstrates the reliability of alternative agility tests. From the results, it also seems the mean time of participants’ best performance were faster in the third than in the first trial. The authors suggested this was because participants accelerated uncontrollably during the first trial, resulting in their inertia not allowing for an efficient COD. This was despite participants being familiar with the testing procedures. The implications of such findings may advocate the inclusion of an extended familiarization period, although others have suggested this may not be entirely necessary [19]. Nevertheless, the work of Sekulic et al. [22] is exemplar that agility testing should not necessarily follow the common ‘Y-shaped’ design and that greater efforts should be made to provide tests more appropriate for individual sports.

Whilst a light stimulus is deemed reliable, concerns surround its ability to discriminate between higher- and lower-level playing ability. For instance, Green et al. [20] found academy-level rugby union players (high-performance group) to be 8.5 % faster than their lower-performance (club group) counterparts when responding to a light stimulus. Likewise, a group of semi-professional basketball players responding to a light stimulus were, on average, 5.9 % faster during an agility test than recreational players [23].

However, discriminating between higher- and lower-level participants using a light stimulus agility test is not a consistent finding. In one study [24], 20 teenage female field hockey players from a regional performance center (high performance) and school/club standard (low level) performed in three conditions (light and human stimulus agility test and COD). No difference was found in performance when responding to a light stimulus or COD test (p > 0.05) but there were for the human stimulus. Such discrepancies amongst the literature raise concerns regarding the ability of a light stimulus to consistently discriminate between higher- and lower-level groups. In one of the few studies including male and female participants, Sekulic et al. [22] compared agility performance between agility-trained and non-trained participants. Males trained in agility sports (e.g., soccer, basketball) achieved significantly better results in the SNG-agility test (p = 0.03; effect size [ES] −0.75) using a light stimulus. In contrast, the female agility-trained and non-trained group did not differ in either the SNG-agility test (p = 0.39; ES −0.39) or SNG-COD (p = 0.61; ES −0.49). However, when females performed a shortened test version (from five to three repetitions), significant differences between the groups were found only for the SNG-agility test. The authors hypothesized that perception and capacities contribute less to the final result of the SNG-agility test than the more commonly used agility test (e.g., ‘Y’ design).

Essentially, what is required to detect and react to a stationary light (temporal processing) is quite dissimilar to processing complex motion in dynamic visual scenes of team sports games. A light is simply either on or off and is thus only assessing an individual’s ability to process information. This may deprive the higher-level athletes use of anticipatory kinematic cues that contribute to their expert advantage [25].

With the emergence of commercially available equipment, it appears a light stimulus will likely remain a common fixture in research as well as a popular tool in the applied sporting environment. Although light stimulus may not consistently discriminate between playing ability, it is unlikely that professional sports teams will prioritize it for this purpose. The fact such equipment is purposely designed, easily accessible, logistically efficient, and likely associated with a smaller degree of noise are particular advantages (Table 5) and likely means its inferior validity may be overlooked.

Table 5 The characteristics, advantages, and disadvantages of the agility tests

4.1.2 Video (Two-Dimensional) Stimulus

In an attempt to improve the game realism and ecological validity of tests, several studies have used 2D video projections of sport-specific situations to assess agility performance. Responding to a ‘specific’ movement performed on video supposedly overcomes some of the limitations associated with a generic light stimulus (Table 5). Generally, a video-based agility test protocol requires participants to sprint through a set of timing gates that will activate a video clip projected onto a large screen. The participant responds to the clip by running through a second set of timing gates.

However, whether a video stimulus provides a superior method of assessment over other test formats (i.e., light and human) is somewhat questionable, particularly given that research has questioned its reliability. For example, junior ARF players exhibited a low level of reliability (ICC 0.33) in response to a video clip of a player and an even lower value (ICC 0.10) for a directional non-sport-specific arrow stimulus [9]. It was suggested, based on a typical error of 0.07 s (video) and 0.09 s (arrow), that the tests were likely to detect moderate to large changes in performance, but that refinements were needed to identify small differences. The authors postulated that a lack of familiarization might partially explain these results. It is also plausible that the relatively young age of the participants, as well as the fact the images of the tester were from different positions (previous agility tests are restricted to a front-on view without a ball) may also be factors.

Besides the reliability of the total test time, more studies are reporting the reliability of factors such as decision-making time, perception response time, and confidence rating. Whilst providing such detail will allow for a comprehensive analysis of the test performance, it seems they may also be more susceptible to reduced reliability. While ICC values of 0.82 for the test time in a group of young rugby league players have been reported, values of 0.31 and 0.50 for the perception response time and confidence rating were also shown [26]. Similarly, the reliability of decision accuracy (ICC 0.74) during a video-reaction test has been shown to be lower than the decision-making time [27]. Such findings may have implications for training prescription whereby perception response and decision-making may be the focus over physical attributes. The inherent variability and thus poor consistency of an individual performing such a task may indeed be an important finding in itself.

Henry et al. [28] sought to validate a video agility protocol by comparing performance with a light agility test. Higher-level ARF players possessed faster agility and movement times for both video and light agility tests than the non-footballers. Interestingly, decision time was faster in the light than in the video agility test. It may be postulated that despite superior anticipatory ability, and thus decision-making time, participants may still require a confirmatory process before executing a movement in response to a game-specific stimulus. Seemingly, a video stimulus may be a more valid tool to discriminate against playing ability than a light stimulus. Similarly, evidence is present whereby elite ARF players were 8.5 % (ES 2.59) faster than their lesser skilled (age-matched school) group when responding to a defender projected onto a large screen, whereas no difference was found when responding to an arrow-projected image [9]. In one of the first known studies using a video stimulus, Farrow et al. [29] measured agility performance in a group of higher, lesser, and moderately skilled young female netball players. The high-performance group was shown to be faster (7 %) than the low-performance group, although only marginally faster (0.8 %) than the moderate-performance group. Performance in a planned COD of the same movement path did not identify any significant differences.

Another study by Henry et al. [30] also attempted to examine the effect of a feint on agility performance. One hypothesis of the study was that the inclusion of a feint would decrease performance of the ‘defensive’ player. A trend for better agility, decision, and movement times in the higher-standard players was shown. In contrast, the higher standard players had slightly longer second decision time in the feint trials and movement time in the non-feint trials. Seemingly, the inclusion of the feint resulted in a modest lengthening of movement time (p = 0.23; ES 0.66) for the higher-performance group but larger deterioration for the lower-performance group (p = 0.002; ES 1.07).

As previously mentioned, the tenet of including a 2D stimulus is that of providing a more ‘specific stimulus’. In its current state, a ‘sport-specific stimulus’ corresponds to a rather generic stimulus and response, performed in an artificial environment and omitting important information. The stimulus and response should be compatible, defined as the degree to which the relationship between a stimulus and an associated response is natural [31]. For instance, soccer goalkeepers have been shown to respond differently to a penalty kick in different conditions [32]. These were penalty kicks either taped on video from the view of a goalkeeper facing a live penalty taker, requiring either a verbal or joystick response, and in situ, facing a ‘live’ penalty taker, which required either a verbal response, a simplified movement response, or a full interceptive response. The highest saving accuracy was reported when viewing live penalty takers with a full interceptive response. Such findings delineate that experimental research needs to adhere to a natural perception–action coupling as closely as possible and may make 2D agility testing somewhat inappropriate.

The visual stimulus also seems to affect biomechanical profile and gaze behavior during an agility task. For example, Lee et al. [33] examined whether 2D versus 3D video displays of an opponent, projected using a customized integrated stereoscopic system, afforded different visual search behavior and motor response times when participants sidestepped to intercept an opponent. Participants fixated less and for shorter periods on the trunk of the projected opponent in the 3D condition and more outside of the opponent’s body than with the 2D condition. No difference was found in the absolute total number and duration either of fixations or in the time to initiate an interception of the opponent in both the 2D and 3D conditions. This opposed the author’s second hypothesis and infers no difference in perception of affordances between the conditions. Sidestepping in response to defenders’ movements projected onto a large screen resulted in different postures and knee moments than did a video projector-based arrow stimulus [34]. Differences between standards were greater with the inclusion of two defensive opponents and converging the participant’s straight line of gaze. Seemingly, the mechanisms underpinning skilled decision making in sports differ between film-based and in situ conditions [35]. Some researchers have also attempted to establish the effect of screen size on performance, concluding that a larger screen is necessary to provide a more realistic environment of life-size images on the screen [36].

The popularity of a 2D-projected video as a means of assessing agility likely stems from the ‘sport-specific’ stimulus it supposedly offers whilst generally upholding its reliability. Yet, in its current state, it is probable that video stimulus may be restricted to the laboratory setting. The practicalities, logistical issues, time constraints, and necessity for specific equipment make frequent field-testing an improbability.

4.1.3 Human Stimulus

A stimulus whereby the athlete responds to an actual human (i.e., the person that initiates the movement to which the athlete must react) has emerged as a popular alternative for measuring agility. The premise, similar to a video stimulus, is that of further increasing the availability of specific body kinematic cues to which athletes respond [25]. Sheppard et al. [37] was the first to include a human in an agility test. In the study, ARF players were tasked with responding to a human performing four possible scenarios, where scenarios were presented in a random order and differently for each athlete. The high test–retest reliability (ICC 0.87) observed within this study has also been reported for other studies [8, 3843].

Despite the aforementioned findings, it is worth remembering that an actual human is involved in testing and their accompanying variable movement still has the potential to affect the repeatability, accuracy, and overall test integrity. In one instance, no significant difference was reported between the times recorded for each of the four tester-initiated movement directions (p = 0.11) [42]. However, Young and Willey [43] highlighted the influence a tester may have on performance in a group of semi-professional ARF players. A strong relationship was reported between decision time and total time (r = 0.77; p < 0.01), as well as a small positive correlation between tester time and total time (r = 0.37; p < 0.05). The latter corresponded to a coefficient variation (CV) of 5 % for the mean tester’s time. In practical terms, this meant a time period of 141 ms (representing 7 % of the total time), being the difference from the longest mean tester time (596 ms) to the shortest (455 ms) trial. The authors concluded that this might make a meaningful difference to the mean total time. Given that the tester was deemed “experienced,” it is also testament that a stringent approach is fundamental when using a human stimulus. It would seem worthwhile to spend a greater amount of time habituating the participants with this form of testing compared with a light or video stimulus.

The test first used by Sheppard et al. [37] has also been adopted in a number of prospective studies, spanning different sports. The resultant findings have generally been supportive of the original work by Sheppard et al. [37], with high levels of validity (Table 2) being reported. However, whether the test first used by Sheppard et al. [37] may, due to its rather generic nature, be suitable to be used across sports is debatable. Moreover, current tests to assess agility may arguably be categorized as responses rather than complex decisions that are characteristic of high-level team sports [30]. The high response accuracy often demonstrated in participants performing this task may be testament to this. Whether the included external cues adequately challenge the cognitive abilities of high-standard athletes is therefore questionable [31, 37]. Likewise, while current tests may be able to discriminate between playing level, this may not be the case for different positions [42].

According to some studies, including a feint may better discriminate between levels [30]. The basis of a feint arises from the double-stimulation paradigm, where the reaction to the first of two closely spaced stimuli is normal, but the reaction to the second is delayed by more than that which would have occurred had it been presented alone [44]. Coupling deceptive movements and/or multiple turns seemingly increases the perceptual, cognitive, and physical challenge; the purpose of which is to gain a time advantage by deceiving an opponent. Research has shown higher-level athletes to experience little change in decision accuracy following a feint, whilst a significant decrease was observed from decision time 1 (before the feint) to decision time 2 (after the feint) in lower-standard players [29]. Lesser-skilled players may be unable to distinguish and interpret the available cues, leading to larger decreases in decision accuracy [30]. Differences have also been seen between moderately and highly skilled performers, despite movement time being similar [27].

However, the overarching rationale for including a human stimulus being that it offers a more ‘specific stimulus’ seems somewhat vague and, arguably, erroneous. The likelihood that such a test is the optimal approach to be applied across different sports is therefore, somewhat questionable. Visual search strategies are likely to vary considerably across sports, between individuals and/or positions, and a given specific task. An (in)compatibility between the stimulus and response may be a mediator in the rate of information processing and speed of the forthcoming motor response [30, 44]. Indeed, faster reaction times have been observed when including a compatible stimulus during testing [45]. This is likely to allow for a rapid motor activation and faster decision-making ability [46]. It would seem worthwhile to venture from the current ‘Y-shaped’ test design and investigate alternative approaches. The challenge is to develop reliable tests that use sport-specific agility scenarios and capture the complexity of movement and decision-making aspects of field agility. This may require the inclusion of ball or other sport-specific equipment, a variety of views (not just front on), multiple players, different movements, and some deceptive actions. Current agility tests have been restricted to the defensive role, and whether offensive agility is unique is not known [13]. Furthermore, the fact that a high-speed camera is needed to analyze the decision time from the test reduces the convenience of using this approach in the field.

4.1.4 Possible Considerations of Testing

Possible factors to consider when implementing training are as follows:

  • Generic and specific agility tests should not be used interchangeably during athlete’s assessments.

  • The term ‘sport-specific stimuli’ is rather loosely used in the context of an agility test. Generic test protocols should ideally be replaced with ecologically valid tests that offer better stimulus–response compatibility.

  • Participants should be appropriately familiarized with the agility test before commencing actual data collection.

  • A life-size image would be more appropriate when using a 2D video stimulus, whilst a high repeatability of the tester is fundamental when an actual human is included.

  • The reliability of a test should be population specific. The reliability values for all parameters (tester time, decision time, movement time, etc.) should also be established during each test period.

  • High response accuracies during agility testing may indicate an inability of the external stimulus to adequately challenge the cognitive abilities of high-standard athletes. This should be considered when interpreting the application of the results.

  • Including a deceptive movement (feint) may better discriminate between standard of play than a single stimulus.

4.2 Factors Affecting Performance

Several factors [11, 12] have been presented as possibly influencing agility performance. Whilst informative, it could be argued that this model may be too simplistic to encapsulate the complexity of agility performance. Cognitive and perceptual factors are considered the discriminating factor in agility performance; however, the majority of research has focused on the physical aspect. Regardless, it does seem our understanding of the mediating factors remains limited, despite the purported importance of agility in team sports.

4.2.1 Cognitive and Perceptual Factors

Cognitive and perceptual factors are heralded as being the factors to distinguish between high- and low-level agility performances [21]. Using a stepwise regression analysis, Scanlan et al. [21] suggested response time to be the sole variable (R 2 = 0.58, p = 0.004) predicting agility time, while decision-making time (R 2 = 0.33, p = 0.049) also shared a large association with agility time. In contrast, morphological (stature, body mass, and body fat) (R 2 = 0.034–0.20), sprint (R 2 = 0.10–0.17), and COD speed measures (R 2 = 0.18) had small to moderate correlations with agility time. More recently, Naylor and Greig [53] found response accuracies on a Stroop color test to have a stronger relationship (R 2 = 0.29) with agility performance than mid-thigh girth, body fat %, and eccentric hamstring strength (R 2 range = 0.01–0.05).

Although cognitive and perceptual factors are considered important, asserting this without knowing what actually modulates performance offers a rather reductionist approach. Whilst some research groups have attempted to better understand the different cognitive function of skilled performance [47, 48], the mediating factors of agility performance remain unknown. Salvatore et al. [48] combined psychophysical and transcranial magnetic stimulation to examine the dynamics of action anticipation and its underlying neural correlates in professional basketball players. Both visuo-motor and visual experts showed a selective increase of motor-evoked potentials during observation of basket shots. From the findings, only higher skilled athletes showed a time-specific motor activation during observation of erroneous basket throws. Unfortunately, such findings may not easily be extrapolated to a more dynamic and applied sport setting.

The importance of superior decision-making and cognitive skills should not be restricted solely to performance enhancement. Poor decision-making ability may also contribute to injuries [49]. Anecdotally, it may be inferred that those players with superior decision-making skills are better able to avoid collisions and, thus, are less likely to be injured. Yet, the available research may not fully represent this. When adjusted for age and playing position, professional rugby league players with poor agility performances (i.e., longer decision times), compared to those with shorter decision times, were shown to have a lower risk of injury [49]. Seemingly, players with poor perceptual skill may actually be protected against contact injuries in professional rugby league. However, the authors hypothesized that players with better playing skill likely occupied positions requiring higher skills and ball involvement [49]. This may expose better players to more physical collisions and a resultant higher risk of contact injury. It would seem fruitful for more research to be conducted into the relationship between agility and injury incidence.

4.2.2 Technique

Technique has been cited as a component of COD ability [12], yet the amount of empirical evidence is comparatively sparse [50]. The majority of research examining technique during unplanned cutting tasks has been conducted with the aim of comparing with planned actions and from an injury viewpoint. Although distinct biomechanical differences are evident [51] and, despite the fact injury and performance should not necessarily be viewed independently, such findings should not be directly extrapolated to performance enhancement.

One study [50] has examined the differences in agility (side-stepping maneuvers) running technique between planned and pre-planned performance conditions in national and international rugby union players. A second objective was to identify any change in technique during conditions (evasive sidestepping maneuvers) with respect to the speed of agility performance. Specifically, the position of foot strike and toe off for the step prior to the agility sidestep (pre-change of direction phase) and then the sidestep (COD phase) were examined. The authors concluded that the presence of a decision-making element limited lateral movement speed when sidestepping and, as such, the foot-placement patterns differed from pre-planned conditions. They also found that fast performers displayed greater lateral movement speed at foot strike (0.52 ± 0.34 m/s) than moderate (0.20 ± 0.37 m s−1, p = 0.034) and slow participants (−0.08 ± 0.31 m s−1, p < 0.001). Less lateral movement speed during conditions was associated with greater lateral foot displacement (44.5 ± 6.1 % leg length) at the COD step than in pre-planned conditions (41.3 ± 5.8 %). Additionally, fast performances exhibited greater increases to lateral movement speed during the sidestep (1.83 ± 0.37 m s−1) compared with slower performances (1.50 ± 0.41 m s−1), for unplanned conditions. Albeit insightful, it would appear that this offers little in the way of ‘optimizing’ technique, and substantive research is required in a variety of sports and populations to further understand the effect of technique on agility performances.

4.2.3 Physical Factors

The implicit goal of an agility task is to redirect total body momentum to a new direction/target as quickly as possible [17]. Despite the purported importance of decision-making and perceptual factors, physical actions constitute the greatest proportion of total time to complete an agility test. It was eloquently put forth by Araújo et al. [52] that, without decisions being realized through action, cognition would forever remain locked in a black box.

4.2.3.1 Strength and Power Qualities

A recent study by Naylor and Greig [53] examined the contribution of body fat percentage, thigh girth, eccentric hamstring strength, and reaction time and accuracy (Stroop test) on a battery of prescriptive and agility tests. Specifically, the tests were an agility test and a linear agility deceleration test as well as sprint and COD tests. Eccentric hamstring strength was the primary predictor in three of the four tests, the exception being the agility test. A moderate correlation was reported between strength and the agility deceleration task (R 2 = 0.33, p = 0.10), while a low correlation (R 2 = 0.03, p = 0.46) was shown between the agility and agility deceleration tests. The relationship between the combined qualities and the agility test was R 2 = 0.41. Arguably, the attitude towards eccentric training often ensues in a blanket approach, whereby its importance is brazenly given for several discrete components (deceleration, COD). It would seem, based on the work of Naylor and Greig [53], practitioners should be transparent and purposeful when including this exercise modality, as it is unlikely to benefit all equally.

In female basketball players, eccentric and isometric strength provided the highest overall contribution (25 and 24 %, respectively) to agility performance, while maximal dynamic strength, concentric strength, and power measurements offered 20, 18, and 12 %, respectively [54]. It is noteworthy that no significant correlation was observed between any strength or power measure and agility performance (r = −0.08 to −0.36, p = 0.43–0.59) [45]. Young et al. [55] also examined the relationship between agility performance and maximum strength (3-repetition maximum [RM] strength), reactive strength (drop jump), and power characteristics (countermovement jump) in community-level ARF players. Multiple regression analysis indicated that the combined physical qualities explained ~56 % of the variance associated with COD speed. In contrast, the relationship between physical qualities and agility were trivial to small (r = −0.10 to 0.123, p > 0.05) and collectively explained only ~14 % of the variance. Similarly, Henry et al. [56] reported a weak correlation (r = −0.25 to −0.33) between unilateral jump (vertical, horizontal, and lateral) and agility movement time in a group of ARF players. A systematic review of planned COD reported the magnitude of correlation with strength and power was, for the most part, small to moderate [16]. From the evidence available, it would seem this relationship is further diminished for agility performance. Seemingly, the addition of a cognitive stimulus may hamper an individual’s ability to utilize and apply force.

Performing an agility task is still vastly complex and requires the synchronization of many body parts and, thus, most probably multiple strength components. For that reason, a clear relationship between isolated measures of strength may be an over-simplification that disregards an appropriate analysis of the effect muscular strength may have on agility performance [54]. Seemingly, each strength component has a different magnitude of relationship to agility performance, and the contribution of each strength characteristic likely differs between individuals [54]. Moreover, adding a perceptual–cognitive demand appears to reduce the significance of lower body strength interaction to agility performance. An interesting paradigm is whether time deficits brought about by decision-making errors can be mitigated during the motor response with superior physical attributes, e.g., speed and power.

4.2.3.2 Functional Movement

Researchers have also examined whether a relationship exists between functional movement screen (FMS) scores, maturation, and agility performance in young (under 11 to under 16) soccer players [7]. Consisting of seven tests, the FMS purportedly evaluates an individual’s movement quality and has become an increasingly popular tool in sports. Participants were assessed using the same protocol (light stimulus) as in Oliver and Meyers [19]. The authors reported that in-line lunge performance was the primary predictor of agility performance (R 2 = 0.38). Aside from this study, it is apparent that practitioners are placing greater importance on players’ ability to move proficiently. Accordingly, it would be appropriate to fully establish the contributing influence, if any, that functional movement patterns may have on agility performance.

4.3 Training

4.3.1 Perceptual Training

Perceptual training, using video clips, is one method considered effective for improving perceptual and decision-making qualities [57]. Whilst some research has shown the benefit of video-based perceptual training on solely cognitive tasks, only one has been performed in agility [10]. Agility performance was assessed in a group of semi-professional rugby league players to determine whether the perceptual and decision-making components of agility could be trained using a video-based intervention. Training sessions involved ten perception–action guided discovery agility drills per session, comprising two parts. In the first part, participants were presented with a video clip projected onto a large projector screen with the clip blackened out (occluded) at racquetball contact. Participants completed the same drill a second time watching the same attacking opponent; however, the participants were able to see the outcome of the shot. Overall, the 3-week training resulted in a significant improvement in mean total agility time. Perception and response time for the agility test, defined as the time taken for a participant to perceive the on-screen opponent’s attacking action combined with the time taken for that participant to initiate a response, was much faster for the training group (pre: 0.34 vs. post: 0.04 s) than for the control (pre: 0.33 vs. post: 0.27 s). No significant change was shown for confidence rating (i.e., the participant’s confidence in making the correct decision), within or between groups [10]. However, the absence of a placebo group may be considered a limitation of the study. Moreover, an improvement in response time, albeit substantial, is irrelevant if the player performs an incorrect response. Accordingly, improved anticipation, decision making, and positioning are only possible if players are attuned to the most relevant sources of information [58]. Also, the participants were semi-professional, with no studies examining whether such training can also improve high-level performers. It would also be interesting to establish whether these gains are indeed transferred to actual sports performance.

Despite the reported benefits of video-based training, the underlying question is whether the training intervention and subsequent gains in agility performance are retainable or indeed transferable to superior decision making during competition. Whilst evidence does exist that a transfer from the laboratory to the field is possible [27, 59, 60], such findings are relatively sparse in the area of agility and warrant further research. It has been suggested that some practitioners believe that smart decision making is ‘god’s gift’ rather than something that has been or can be trained [59]. Seemingly, cognitive interventions, which develop the knowledge base associated with perceptual skill, have more practical utility than clinically based visual skills training programs [59].

4.3.2 Small-Sided Games

Over the last decade, small-sided games (SSGs) have received a large amount of interest in the applied and research domain [6164]. Advocates refer to it as an effective method of simultaneously training the physical, technical, and tactical qualities of a player [6063]. Chaouachi et al. [65] recently examined the effects of SSGs versus COD on agility and COD performance in junior soccer players. Players’ agility was assessed with (agility–ball) and without a ball (agility). The SSG training comprised 1 versus 1, 2 versus 2, and 3 versus 3 drills, the COD group performed pre-planned COD drills whilst a control group performed regular skill-development drills. The SSG training improved agility (6 %), linear sprinting (1.5 %), and COD (5.1 %), although the gains in sprint and COD were greater following the COD training (4 and ~7 %, respectively). A similar study also compared SSG versus COD training on agility and COD performance in a group of under 18 ARF players [60]. In this study, SSG training improved agility, whilst the COD training was ineffective for developing either agility or COD performance. The authors attributed the gains to an enhanced speed of decision making, rather than movement speed, a common belief amongst practitioners as a consequence of SSG for improving agility. It appears that the small confinements (and thus reduced time to think) during SSGs may improve decision-making speed, although this notion does not seem to have been directly investigated.

The appropriateness of SSGs for developing other perceptual and decision-making qualities, e.g., pattern recognition is unknown. Indeed, it is improbable that players will exactly perceive, decide, or act during SSG as they would during an actual 11-a-side soccer match. For instance, studies in soccer have reported large practical differences (effect size ranged from 1.5 to 21.2) between small- and large-sided games for the number of blocks, headers, interceptions, passes, dribbles, shots, and tackles executed [63]. Such diversity in the actions performed is likely to correspond with variable pattern recognition and decision-making demands. That players are rarely confined to positional constraints during SSG may be one factor that explains these differences. A practical example is the improbability that a central defender will perform a 180° turn in response to a long pass from the opposing team during an SSG, as is the case in an 11-a-side match. Midfielders, on the other hand, may benefit more from the SSG format as a means of improving their skills in a congested playing area that is characteristic of short decision-making periods. Essentially, players acquire skill in coupling actions and decisions to changing informational constraints of competitive performance environments [66]. Hence, perception–action couplings supporting decision making are considered context specific and relevant to the properties of particular performance environments [67]. These include the distance to a teammate/opponent, goal, or target area [68, 69], and location of the ball relative to a player. Gabbett et al. [70] offered an insight into the application of agility test results for training prescription. Specifically, women soccer players were classified as requiring either (1) decision-making and COD training to further consolidate good physical and perceptual abilities, (2) decision-making training to develop below average perceptual abilities, (3) speed and COD training to develop below average physical attributes, or (4) a combination of decision-making and COD training to develop below average physical and perceptual abilities. In summary, it is probable that the characteristics of SSG will manifest in an unequal distribution of appropriate agility training amongst different playing positions [71].

4.3.3 Warm Up

Warm ups (WUs) are common practice and considered an important aspect of an athletes’ preparation for any forthcoming testing, training, or match activity. Accordingly, there has been considerable interest in this broad area [7275]. Yet, despite being frequently deployed in practice, the scientific literature regarding its efficacy remains inconsistent. Research has traditionally been primed towards identifying the effects of WU on characteristics such as strength, power, and speed. The amount of research examining the effects of WU on agility, requiring both a cognitive and a physical element, is comparatively much lower [75, 76]. This is despite coaches and practitioners alike often advocating a WU as being important to ‘attune’ the physical and cognitive qualities prior to activity.

Gabbett et al. [76] examined the influence of closed- versus open-skill WU on agility as well as speed, COD, and countermovement jump performance. Junior basketball players (n = 14) were randomly allocated to either the open- or closed-skill WU. The open-skill WU comprised dribbling and moving in response to an opponent, 1 versus 1 (defender vs. attacker) games whereby participants were encouraged to read body cues, and 4 versus 4 SSGs. The closed-skill WU included skipping, accelerations, deceleration, and COD efforts. No significant differences were observed between the open- and closed-skill WU on agility performance. In adult basketball players, it has been suggested that closed-skill agility properties are similarly developed in starting and non-starting players [39]. In contrast, facets of open-skill agility performance such as anticipation, visual scanning, pattern recognition, and situational knowledge might be central distinguishing qualities for team selection in basketball [39]. Scanlan et al. [39] also showed starters possessed faster decision-making (25 %) and agility times (8 %) but slower (2 %) COD times. Compared with the junior players in Gabbett et al. [76], they were also 9–22 % faster. Elsewhere, Zois et al. [75] examined the effects of different WUs on agility, COD, countermovement jump, and speed performance in ten amateur male soccer players. The WUs comprised (1) 3 × 2-min SSG (3 vs. 3), (2) a 5-RM seated leg press lasting 15 s, and (3) a 23-min, commonly used team-sport WU (including high knees, butt kicks, etc.). When compared with baseline, agility was ~5 % (ES 1.1 ± 0.7) faster following the 5RM WU and ~4 % faster (ES 0.8 ± 0.7) after the SSG WU, whilst the effect was ‘unclear’ following the team-sport WU (0.9 %, ES 0.2 ± 0.7).

4.3.4 Training Recommendations

Training recommendations relevant to development of agility include the following:

  • Perceptual and decision-making exercises, with appropriate stimulus and response, are highly important, whilst decision-making speed should not supersede accuracy.

  • SSGs are superior to COD training for developing agility performance. It may be that speed of decision making is enhanced due to the small confinements rather than movement speed.

  • SSGs and strength training may be appropriate as a WU to improve agility performance; however, the mechanisms for strength training need to be elucidated.

4.4 Complementary Methods

4.4.1 Vibration Exercise

Vibration training has received a reasonable amount of attention as a modality to enhance physical performance [7780]. Although beneficial effects have been seen for some performance measures, its effectiveness is far from clear. Evidence has shown an increase, no change or decrease, in various performance measures [7780]. One study has investigated the effects of vibration exercise on agility in a design that also included 1.5, 3, and 5-m sprint performance [80]. Eight female premier club netball players performed side-alternating vibration training and control (no vibration) exercise in a randomized crossover design performed 1 week apart. In this instance, no significant changes were reported for agility performance. Although prospective studies may begin manipulating intensity and duration, it would seem more useful to first elucidate the possible mechanisms for any likely change in performance that may occur from this form of training.

4.4.2 Caffeine

Generally, caffeine has gained acceptance as a performance-enhancing endogenous ergogenic aid. Some, but not all, studies showed improved physical and cognitive performance [8185]. Its effects on agility performance, comprising both perceptual decision-making and physical factors, remain equivocal. In a randomized double-blind counterbalanced study design, a group of ten moderately trained team sport athletes ingested either a 6 mg kg−1 dose of anhydrous caffeine (gelatin capsules) or a placebo dose containing only 0.55 g of artificial sweetener before an 80-min simulated match intermittent running protocol [81]. An agility test was performed during each period with measures of total, movement, and decision-time and decision-making accuracy recorded. Although there was no significant interaction effect (time × condition), performance was consistently faster after caffeine ingestion (significant main effect for condition; p = 0.005). Specifically, mean percentage improvements of total (2.3 %) and agility (~4 %), decision (~9 %) and movement times (~3 %) were observed. Interestingly, improvements were also observed in decision-making accuracy after caffeine ingestion, in the early phase of the simulated test and in both a fatigued and a fresh state. Using a double-blind repeated-measures design, Jordan et al. [85] found caffeine supplementation (6 mg kg−1) produced faster agility times in male youth (aged 14 years) soccer players. In contrast, Pontifex et al. [86] reported no effect on agility performance following the ingestion of caffeine (6 mg kg−1), despite an improvement in repeated sprint ability. It is clear that further research is warranted to identify the effects of caffeine on agility performance.

4.4.3 Neutral Amino Acid

The effect of neutral amino acid on agility performance has also been studied [44]. A group (n = 15) of male sub-elite ARF players performed an agility and motor skills test as well as psychological tests to assess mood states and cognitive function before and after supplementation. Participants completed a double-blind crossover trial, receiving either the tryptophan-‘depleting’ (without tryptophan) or protein control (with tryptophan) mixtures of large neutral amino acid. Depleting serotonin levels improved agility performance by 5.2 % after the fatiguing exercise compared with the baseline trial, while the protein control elicited a 2.9 % improvement. While such research demonstrates the possible effect of neutral amino acid supplementation on agility performance, it would seem more useful to fully elucidate the efficacy of different training interventions.

4.5 Future Research

From this systematic review, it seems that our knowledge regarding different aspects of agility testing, training, and mediating factors is basic. Anecdotal propositions and beliefs currently underpin much of our perceived understanding of agility. In terms of testing, a ‘Y-shaped’ test configuration whereby participants perform a common 45° cut in response to a stimulus has dominated the literature. Alternative methodological designs are necessary and would likely gain more credibility if based on observational studies in an effort to attain ecological validity. It would seem worthwhile to venture from the current ‘Y-shaped’ test design and investigate alternative approaches. The challenge is to develop reliable tests that use sport-specific agility scenarios and capture the complexity of movement and decision-making aspects of field agility. This may require the inclusion of ball or other sport-specific equipment, a variety of views (not just front on), multiple players, different movements (attacking or defending), and some deceptive actions. Whilst establishing a sport-specific test may appear elusive, or indeed futile, attempts should at least be made to appreciate the different sports. Establishing the long-term reliability (stability) of agility tests also seems an intuitive endeavor, particularly when including a human stimulus. A greater number of mechanistic studies should form a large proportion of future research, with the focus on understanding the cognitive and decision-making qualities of higher-standard players. A wider array of training interventions, as well as extending past study designs, should also be addressed. This would ideally form part of a holistic approach, rather than focusing on one or few parameters (e.g., solely strength). Identifying whether training interventions can induce superior technique or proficiency is likely an area of interest. Ideally, these would be examined longitudinally whilst also identifying any accompanying detrimental effect. Moreover, establishing the level of transfer to actual match performance as well as the retention are key areas of study. Different SSG formats and pitch configurations are likely to be a rapidly emerging area of study. Although complementary approach methods (e.g., caffeine, vibration exercises) may offer possible advantages, exhausting all aspects of training should be prioritized. Finally, replication and novel studies on high-level athletes are needed to verify whether current knowledge applies across all performance levels.

5 Conclusion

Agility is regarded as a key aspect of performance in team sports and is considered capable of discriminating between higher-skilled individuals and their lesser-skilled counterparts. An increasing number of studies has been conducted in this (agility) area over recent years, with test reliability and validity being the focus. Generally, reliability has shown to be high for light, video and human stimuli. However, this may be reduced when used for younger athletes. A human stimulus may be the most appropriate to identify differences between standards of play. Practitioners should refrain from using the tests interchangeably, as differences likely exist between the tests. Our knowledge regarding the mediating factors remains in its infancy and significant developments are necessary in this area. Perceptual and decision-making factors are often heralded as the discriminant factors between higher- and lesser-skilled players. However, the factors explaining these differences in cognitive function remain unknown. Anecdotally, technique is often proclaimed to be important, yet the evidence is not representative of this. Physical factors seem to have had the greater focus in terms of research. The importance of strength may be diminished when a cognitive demand is included. Few intervention studies have been conducted; however, from those available it seems SSGs can offer a good stimulus. Video-based perceptual training may improve decision-making ability, but the associated logistical and time demands may hinder its usefulness and application in the sport setting. It is unknown whether improvements in an agility test can transfer to a real-life match environment.