FormalPara Key Points

Lifelong physical activities are typically performed individually or in small groups, involve minimal structure and minimal physical contact, are characterized by varying levels of intensity and competitiveness, and may be easily carried into adulthood and old age.

Additional research is needed to establish the validity and reliability of lifelong physical activity movement skill tests for activities not included in this review, such as yoga, Pilates, tai chi, aerobics, and running.

Future research would benefit from determining the predictive validity of competency in lifelong physical activities to ascertain the strength and direction of association between competency levels and future physical activity.

1 Introduction

Developing adequate movement skill competency across a broad range of activities is important for individuals of all ages [13], and competency in a range of fundamental movement skills (FMS) in childhood has been found to be a predictor of physical activity in adolescence [4]. Movement skills are often learned and developed throughout childhood [57], initially in the form of FMS, of which there are three types: locomotor (i.e., running, jumping), object control (i.e., catching, kicking), and stability (i.e., balancing, twisting) [8]. If children fail to develop competency in FMS [9, 10], they may find it difficult to learn and master more refined movement skills, such as sport-specific skills (i.e., pitching a ball, serving in tennis) [7].

Previous movement skill competence theory [7, 11] posits that individuals ascend a hypothetical mountain of motor development, whereby more advanced movement acquisition is dependent upon the foundation established in the previous level. The proposed levels of movement skill acquisition are (1) reflexive, (2) preadapted, (3) fundamental motor patterns, (4) specialized sports skills [11], and (5) skillful [7]. These models are based on the premise that individuals cannot be physically active throughout the lifespan without achieving proficiency in FMS. However, some lifelong physical activities do not require a foundation in FMS that are often assessed, meaning children who are not competent in FMS may alternatively perform lifelong physical activities to be physically active. As such, it has been suggested that young people need to be exposed to, and develop competency in, a range of movement skills associated with ‘lifelong physical activities’ that can be easily carried into adulthood [1217].

Schools may present a possible setting for learning and testing competency in lifelong physical activities, as they may have access to personnel and resources, such as qualified teachers, equipment, space, and the ability through physical education to provide exposure to these activities [14, 16, 18, 19]. A noted decline in physical activity occurs during adolescence [20]; thus, this may also be a critical period in which individuals should learn and develop competency in a range of lifelong physical activities. Indeed, lifelong activities learned at this time may have health benefits both at the time they are learned and later on in adult years [14].

Although a variety of definitions and alternative terminology for lifelong physical activities (i.e., lifetime [14], lifestyle physical activities [13, 16, 21]) have emerged in the literature [13, 2224], different characteristics appear regarding what defines a lifelong physical activity, which consequently makes identifying and promoting these activities difficult. Of note, the term ‘lifelong physical activity’ can also be used to describe how an individual can be physically active across the lifespan. It is proposed that the term ‘lifelong physical activity’ will only be used to describe a subset of physical activities defined as those sports and leisure-time activities typically performed individually or in small groups (typically four or fewer people) that involve minimal structure, avoid physical contact, are characterized by varying levels of intensity and competition, and, importantly, may be easily carried into adulthood and old age. Examples of lifelong physical activities that fit this definition include aerobics, badminton, cycling, dance, golf, Pilates, racquetball, resistance training, running, swimming, tai chi, tennis, and yoga.

Many team sports, such as basketball, hockey, and soccer can be played throughout adulthood, but do not fit the definition of a lifelong activity due to the number of participants and the higher levels of organization required [14]. In addition, due to the physical contact involved, many team sports have higher incidences of injury (number of injuries/1000 occurrences), such as soccer (64.4/1000), rugby (95.7/1000), and hockey (62.6/1000) [2527]. In comparison, popular lifelong physical activities, such as tennis (23.1/1000), resistance training (11.9/1000), and swimming (6.1/1000) [2527] have considerably lower injury rates.

One characteristic of lifelong physical activities included in previous definitions, but not in this study, is the use of minimal equipment. While this may be true for most lifelong activities, there are some notable exceptions, including golf and resistance training. Although golf could be played with only two clubs (i.e., an iron and a putter), this is neither ideal nor representative of how golf is usually played. Similarly, resistance training is often performed with free weights (e.g., barbells and dumbbells) and equipment (e.g., cables and pulleys) typically found in a health club, yet it can also be performed using only body weight exercises (e.g., squats, lunges, push-ups). Regardless of equipment, these two activities are undoubtedly lifelong activities, as shown by their high levels of participation amongst individuals of all ages [28]. Although equipment considerations may be important in performing an activity, the inclusion of this characteristic in the definition of lifelong physical activities would exclude relevant lifelong physical activities from this review.

The assessment of movement skills (e.g., FMS, sport-specific, lifelong) is vital for informing individuals of their competency levels, as well as informing teachers and researchers of potential movement skill deficiencies in a population, so programs or interventions can be designed and implemented [29]. Movement skills are commonly assessed through product or process measures [8]. Product measures quantify outcomes [8], which are expressed, for example, in terms of how fast a ball is thrown (i.e., speed), time it takes to swim 100 m, or distance a soccer ball is kicked. Product measures are quick and easy to assess and interpret, but they cannot be used to determine how an outcome was achieved [29]. Conversely, process-oriented measurement is concerned with the qualitative characteristics that describe successful movement patterns [8] and allow movement component deficiencies to be more easily identified and corrected. The ability to correct individuals on specific components of a movement may help prevent injury [30], may help contribute to an individual’s feeling of competence, and can enable the identification of specific skill components to be addressed in future interventions to enhance performance proficiency. For example, when an individual is introduced to a new activity, such as weightlifting, the technique, as opposed to the outcome (i.e., amount of weight lifted), is more important for the safety of the individual [1]. Over time, if safe technique is practiced and utilized, strength gains may be achieved with reduced possibility of injury [1].

Previous reviews have examined the validity and reliability (alternatively called objectivity) of assessments in both FMS [31] and sport-specific skills [32]. No such review exists for the assessment of field-based measures (i.e., not taking place in a laboratory setting, but rather in, for example, a school or community sporting ground) of lifelong physical activities. Given the widespread use [3335] and previous success of process-oriented skill assessment in FMS (e.g., Test of Gross Motor Development-2 [TGMD-2]) [36], an analysis of measurement properties of current assessments examining qualitative aspects of lifelong physical activities is warranted. Therefore, the purpose of this systematic review is to review the methodological properties, validity, reliability, and test duration of current field-based measures to assess movement skill competency in lifelong physical activities, as well as clearly define the characteristics unique to lifelong physical activities.

2 Methods

2.1 Search Strategy

A systematic review of four search engines (PubMed, Scopus, ProQuest, and SPORTDiscus) was conducted, focused on field-based measures of lifelong activities. No time restrictions were applied when searching for articles. Searches conducted in the individual databases included various combinations of the following terms: ‘reliability’ OR ‘validity’ AND ‘fitness’ OR ‘physical activity’ OR ‘sport’ OR ‘motor’ OR ‘movement’ OR ‘skill’ OR ‘battery’ OR ‘instrument’ OR ‘qualitative’ OR ‘technique’ OR ‘components’ OR ‘criteria’ OR ‘measurement’ OR ‘test’ OR ‘assessment’. A secondary search for specific lifelong physical activities (aerobics, badminton, cycling, dance, golf, Pilates, racquetball, resistance training, running, swimming, tai chi, tennis and yoga) was performed. Additional articles were found by examining the reference lists of included articles. After the initial searches, the titles and abstracts of all relevant articles were assessed. If the articles were deemed appropriate, then a full-text review was performed, and the application of inclusion and exclusion criteria allowed for further evaluation of included review articles.

2.2 Inclusion Criteria

Two authors independently assessed articles for inclusion in the study. If an agreement could not be reached, a third author reviewed and made the final decision on whether the article should be included. The criteria for inclusion in the study were as follows: (1) articles must have been peer reviewed; (2) full abstract, article, and reference list must be present; (3) articles must report at least one lifelong physical activity movement skill; and (4) article must report at least one aspect of validity or reliability relating to the movement skill. If a movement battery was used to test multiple skills, then the skill was only included if the skill and corresponding validity or reliability information could be extracted.

As assessments examining skill proficiency should display adequate measurement properties, it is important to consider the validity (e.g., content, construct, and/or criterion) of the measure. Content validity is concerned with whether a test is a measure of all skills relevant to a particular activity [37, 38]. For example, it could be assumed that the content validity of a tennis assessment is higher in a test that assesses the forehand, backhand, volley, and serve, as opposed to a test that examines just the forehand. Construct validity is a measure of whether a test can measure a quality or attribute that cannot be operationalized. It consists of discriminative (ability to assess performers of different ability by another measure) and convergent (relation of a test with another measure of the same construct or associated measures) validity [38, 39]. Finally, criterion validity refers to the ability of a test to show agreement with a ‘gold standard’ or external measure. Criterion validity can also constitute concurrent (relating score with a ranking in an alternative measure) or predictive (relationship of a score to a future performance) validity [38].

Three main types of reliability were reported for the studies included in this review. Inter-rater reliability was defined as the agreement between two or more raters on an assessment/score [39]. Intra-rater reliability was defined as the level of agreement of a single observer on multiple assessments/scores [39]. Finally, test–retest reliability is defined as the level of consistency over two or more rounds of testing [39].

2.3 Exclusion Criteria

Studies were excluded if they met any of the following criteria: (1) the activity did not fit the definition of a lifelong activity; (2) insufficient information on validity and/or reliability was reported; (3) the skill was assessed via use of a product measure; (4) the qualitative criteria for measuring the skill were not clearly defined; or (5) articles were not reported in English.

2.4 Assessment of Study Quality

Two authors independently reviewed all included articles for study quality (see Table 1) based upon five criteria adapted from a risk-of-bias assessment in a previously published review on sport-related skill outcomes [32]. The five criteria by which articles were assessed included (1) sample size, reported as the number of participants used specifically for establishing the validity and/or reliability of the skill test; (2) participant details, which included age, sex, number of participants, and ability level; (3) whether participants were allowed to practice the tested skills before the official assessment (practice session information was simply reported as having occurred or not); (4) testing environment, including the equipment remaining the same throughout the entire testing process, which was reported as yes or no, or a partial report was given if the stability of conditions can be implied due to study design; (5) reported amount of time between assessments, if applicable. Along with study quality, authors extracted validity and reliability results from each article. As general group associations are determined using correlation coefficients (r) and intraclass correlation coefficient (ICC), values were classified as follows: <0.4 was rated as poor, ≥0.4 to <0.8 was moderate, and ≥0.8 as excellent [39, 40]. As the κ coefficient is a measure of exact agreement between raters, a slightly modified scale was used: >0.01 and ≤0.2 was rated as poor, >0.2 and ≤0.4 was rated as fair, >0.4 and ≤0.6 was moderate, >0.6 and ≤0.8 was good, and >0.8 and ≤1.0 was excellent [41]. If authors could not agree at any point during the data extraction phase, a third author made the final decisions on study quality and validity/reliability extraction.

Table 1 Risk of bias

3 Results

Preliminary search results identified 7508 articles; however, after examining titles and abstracts, 154 full-text articles were retrieved and reviewed for eligibility for inclusion in this review. Reasons for exclusion of search results can be viewed in Fig. 1. After inclusion/exclusion criteria were applied to the full-text articles, 17 met all criteria for inclusion into this review. These 17 articles consisted of eight different lifelong physical activities, including resistance training (three), badminton (two), tennis (three), cycling (two), racquetball (one), swimming (two), golf (one), and dance (three) articles. More specific information related to the skills tested, equipment needed, and the sample used in each study can be viewed in Table 2.

Fig. 1
figure 1

Results of systematic literature search

Table 2 Skills tested, equipment used, and participants involved in skill tests

3.1 Risk of Bias

Overall, relative to the study type and design, the sample sizes ranged from small (n = 6) to very large (n = 131). One study only established content validity and did not report a sample size [42]. Only 12 % of studies had a sample size greater than 100, while 47 % of studies involved small sample sizes of 30 participants or fewer. When reporting participants’ details (i.e., sex, age, level of experience, and number of participants), only seven studies adequately reported these details, while the remaining ten were missing at least one criterion. As previously reported by Robertson et al. [32], the ability level of study participants/cohorts is commonly not reported in studies, and this holds true for the current review as this detail went unrecorded more than any other participant detail (n = 8). Six of the 17 included articles allowed participants to practice the studied skills before the official test was undertaken. However, it should be noted that one study [43] had an optional practice session; thus, we were not able to determine whether all participants had practiced. Nine studies reported keeping testing conditions the same between assessments (i.e., environment and equipment), while the remaining eight studies either made no mention of keeping testing conditions the same or the stability of conditions could not be deduced due to study design.

3.2 Validity

Content validity was the most commonly reported validity for studies in this review. A total of 41 % of studies cited content validity, and all of these studies used some type of expert panel to establish the relevant skills/domains to be assessed. Two of the studies [43, 44] additionally used a literature review to further justify the inclusion of specific skills to allow for adequate content validity of their test.

Construct validity was reported in 24 % of studies. Of the studies reporting construct validity, three different statistical analyses were used. Lubans et al. [45] used a regression model, involving the total score of the resistance skill training battery and sex, and found that 39 % of variance could be explained by a muscular fitness score. Ducheyne et al. [46] established construct validity through factor analysis, which resulted in three factors being extracted from the skills test, including during-cycling skills, walking with the bicycle, and dismounting the bicycle. Discriminative validity was established for a test of dance and golf proficiency [47, 48]. An analysis of variance was used to test for group differences between ability level (i.e., non-dancers, beginners, intermediates, advanced, and professionals) and overall dance test scores. Alternatively, the golf assessment tested for differences in golf skill competency according to age (e.g., 6, 7/8, 9/10 year olds).

Only two studies tested for criterion-related validity. Toriola et al. [49] classified participants as low-skill (displaying less than 50 % of badminton service components) or high-skill (displaying more than 50 % of badminton service components) badminton players. Participants were then scored on a service test (i.e., quantitative test based on where the shuttlecock landed on the serve) while simultaneously being assessed by the judges on the quality of their movement. The results from these two assessments were then correlated, which yielded a low positive association for both low-skill (r = 0.04) and high-skill (r = 0.06) performers. These results indicate that the judges’ process-oriented scoring of participants (i.e., quality of movement) could not sufficiently determine participants’ scores on the overhead serve test (i.e., quantitative score). Similarly, process-oriented ratings on a racquetball skill battery [43] were used to assess the quality of participants’ movements for eight different racquetball skills. This rating was then correlated to individuals’ final standing in a racquetball tournament. This study revealed a higher relationship (r = −0.48) compared with the badminton service test. A rank of one indicates the best player (i.e., high racquetball ability), whereas a score of ten would indicate the tenth best player in the tournament (i.e., less racquetball ability). Therefore, while criterion validity may provide important information in terms of predicting future performance or how a skill test compares to ‘gold standards’, the results of studies included in this review may show that more research should focus on improving and/or establishing criterion validity for use in process-oriented tests.

The validity results of included articles are displayed in Table 3. Six tests in this review failed to report any type of validity [5055].

Table 3 Measurement properties

3.3 Reliability

All but one study [42] included in this review reported at least one type of reliability. Most common was inter-rater reliability (n = 12). This was reported either as the percentage of agreement [53, 54, 56], r coefficient [44, 47, 49, 55, 56], ICC [46, 4951], or a κ coefficient [52]. Intra-rater reliability was reported in 41 % of studies in a similar fashion as inter-rater reliability, with three studies reporting r coefficients [44, 47, 55], three reporting ICCs [43, 48, 50], and one study using percentage of agreement [53]. While most studies showed a high level of inter- and intra-rater agreement, one study [53] had questionable levels of inter-rater agreement (i.e., percentage of agreement below 80 %) for two of the six components assessing the overhead tennis serve.

Test–retest reliability was only reported in four studies [45, 48, 56, 57]. Of those studies reporting test–retest reliability, two studies reported this as an r coefficient [56, 57] and one as an ICC [48]. The fourth study reporting test–retest reliability was unique in that this was demonstrated through rank order repeatability (i.e., ability of participants to remain the same across multiple trials) and change in mean (i.e., change in score between trials of an individual as opposed to group differences and typical error [45, 58]). These statistics were unique to the resistance training battery identified in this review, and the authors of the paper were comparing differences between individuals, unlike other tests that compare group differences. Additionally, coefficient of variation [59] was used in another article assessing the resistance training battery to further show the reliability of the instrument. Two studies reported three different types of reliability statistics [45, 56], while all other studies reported either one or two reliability statistics. Overall, however, levels of reliability were moderate to excellent, with no ICC below 0.60, r below 0.67, and percent agreement below 69 %.

3.4 Test Duration

To the authors’ knowledge, no published guidelines for determining adequate test duration exist. However, test duration has been used as one component of feasibility in a previous sport skill review [32]. Thus, duration to assess a single participant (independent of set-up time) was extracted for this review. Eight of the 17 articles reported time to assess a single participant in a skill test/battery [43, 45, 47, 50, 52, 54, 56, 59]. Three tests took 5 min or less to assess a single participant [47, 50, 54]. Additionally, the resistance training skills battery reported 8- to 10-min test durations [45, 59]. The remaining three articles reported a test duration of 20 min or more [43, 52, 56]. The rest of the articles (n = 9) included in this review either made no mention of the time needed to administer the given test, or the time needed was unclear, thus test duration could not be determined.

3.5 Samples and Skills Tested

Information pertaining to skills tested and participant samples used can be found in Table 2. Overall, samples of included studies were young, ranging from preschool age to college students, with the exception of two dance studies [47, 50] that included participants aged up to 30 years. Additionally, three of the 17 studies, all of which were dance tests [44, 47, 50], used some elite or professional dancers.

4 Discussion

This review was conducted to assess the methodological properties, validity, reliability, and test duration of process-oriented lifelong physical activity measurement tools, as well as to clearly define the characteristics unique to lifelong physical activities. Although 17 studies were included in this review, only assessments for eight different lifelong physical activities were identified (i.e., resistance training, badminton, tennis, cycling, racquetball, swimming, golf, and dance). All but one study reported some form of reliability, but fewer studies reported the validity of measurement tools. These results may indicate that, while some work has been done on creating valid tests of lifelong physical activities, current tests can still be improved. This review also highlighted the need for assessments of other popular lifelong physical activities, such as yoga, Pilates, tai chi, aerobics, and running.

4.1 Risk of Bias

It should be noted that the majority of the studies failed to describe the participants’ characteristics in sufficient detail, which limits the generalizability of findings. For example, few studies described their sampling frame and participants’ ability levels. While nine studies specifically stated the participants’ skill levels (e.g., beginner, expert), one study [50] used all professional, national, or international level participants, and all five studies used all beginner level participants [43, 47, 51, 54, 57]. By using participants with high ability levels (e.g., professional), the applicability of the content tested for the general or even amateur population may be questionable. For example, competencies in rhythmical accuracy, spatial skills, and accuracy of movements may be too detailed for anyone other than the most elite dancers. Thus, while tests of dance competency exist [44, 47, 50], their suitability for assessing lifelong physical activity competency may be inadequate. In the future, recruiting a more heterogeneous sample with older people (above the age of 20 years) and varied ability levels, may be beneficial, as results may therefore be more applicable to the population as a whole. Thus, the validity and reliability of these lifelong physical activity assessments should hold true for people of all ages. If developed tests are not valid or reliable in older populations, then identifying specific movement skill deficiencies in these populations may be compromised.

4.2 Reliability

As a whole, reliability was better reported than validity. Inter-rater reliability was the most commonly reported type of reliability. Three studies reporting inter-rater reliability had moderate reliability [49, 55, 59], two studies ranged from moderate to excellent levels of reliability [46, 53], and the rest of the studies reporting inter-rater reliability had excellent levels. Intra-rater reliability was also well reported, and levels of intra-rater reliability were classified as excellent for all these studies, except one study that was near excellent levels with an ICC value of 0.79 for a test of golf proficiency [48]. Rank order repeatability showed moderate to excellent levels of reliability for the resistance skill training battery, and acceptable levels of change in mean and typical error were also displayed for this test [45].

Test–retest reliability was only reported in four studies and should be a focus of future studies to see whether results are reliable over time, as opposed to a one-off measurement. If a test is to be considered reliable, the test needs to have adequate stability (i.e., results are similar over time) [39] and sensitivity (i.e., ability to detect small, meaningful differences in scores, such as in the resistance skills training battery) [45, 59, 60]. By addressing these issues, future tests can be administered with greater confidence regardless of time between assessments. While rank order repeatability is an important form of reliability, researchers are encouraged to assess other forms of test–retest reliability. More specifically, change in mean and typical error can be used to determine variability within an individual’s score, which is particularly important when determining the effect of an intervention on movement skill competency.

4.3 Validity

Only ten of the studies included in this review reported validity. Overall, content validity was the most frequently cited type of validity, while criterion validity was largely unreported. Very few process-oriented measures of lifelong physical activities are available; thus, comparing results of one assessment with results of a second assessment for the same activity rarely occurs. Particular attention in future research should be given to ensuring additional forms of validity (e.g., predictive, construct), as opposed to just content validity, are established for any test of movement skill competency. Research is also required to create multiple assessments for a given sport or activity, thereby allowing for more construct and criterion validity of lifelong physical activities to be established. By creating more appropriate tests, researchers and practitioners alike will possess a range of assessments to test an individual’s competency, which can help to eliminate deficiencies in movement skills or better teach individuals how to correctly perform a skill. It is important to remember that test validity is highly contextual and is not carried across situations, thus it cannot be assumed that a test validated with children will provide similar results for adolescents or adults.

One reason that previous skill tests using process-oriented measures, such as the TGMD-2, have been used with success in the past may be due to the numerous types of validity that have been established in a number of different settings. For example, the content validity of the TGMD-2 was established through the agreement of three experts who judged the appropriateness of the skills included in the battery. Second, criterion validity was shown through the strong correlation of the TGMD-2 to a similar measure of movement ability. Finally, construct validity was established through its ability to test for age differentiation, group differentiation, item validity, subtest correlations (i.e., locomotor and object control subtest), and factor analysis [36].

Researchers are encouraged to assess the predictive validity of lifelong physical activity movement skill tests by comparing results with physical activity behavior. This is important because if lifelong physical activities are able to predict high levels of physical activity, then justification for the inclusion of such activities in the school curriculum, particularly in secondary school, may be warranted. Due to the decline in physical activity that commonly occurs in adolescence [61], it is imperative that young people develop competency in a range of fundamental, specialized, and lifelong physical activity movement skills. Indeed, recent reviews and national guidelines have highlighted the importance of developing movement skill competency to ensure that young people are prepared for a lifetime of physical activity [19, 6264]. While the relationship between FMS and physical activity during childhood and adolescence has been well documented [4], less evidence is available to support the importance of FMS beyond the adolescent years. It is also well reported that not all individuals will attain proficiency in FMS. As such, these individuals may need an additional set of movement skills in lifelong physical activities that they can learn and may provide another or further opportunity to be physically active. Thus, lifelong physical activities may play a critical role in obtaining higher levels of physical activity into adulthood.

4.4 Test Duration

Just under half of the studies in this review noted a test duration between 1 and 45 min. Longer tests may be acceptable for smaller groups of people, while larger groups may be better served by a quick, efficient test for assessing skill competency. Unfortunately, there is no well-accepted criteria for determining whether a test is too short or too long; thus, researchers need to use their best judgment when creating tests [65]. Given that tests of lifelong physical activities may be targeted in schools, where lack of time in physical education is a known barrier [66], the need for shorter tests may be justified. Test duration may be influenced by other variables such as equipment needed, number of trials tested, and administration duties. While these are all important to consider when determining appropriate test duration, the validity and reliability of a given test should not be compromised. Previously, reviews have noted movement skill tests that take anywhere from 15 to 90 min to complete for a single participant [31, 32]. Around 20 min seems to be the most common amount of time used to assess various FMS, sport, and lifelong movement skills [31, 32]. For example, the TGMD-2 [36] takes about 20 min to administer, and this movement skill assessment is widely used [35, 67, 68]. More research on test duration for skill assessment may be beneficial to see approximately what amount of time balances feasibility with obtaining sufficient information on an individual’s ability.

4.5 Limitations

Limitations of this review are that only eight different lifelong physical activities were identified. More tests of lifelong physical activity competency may exist; however, either validity or reliability of these tests have not been established or they may appear elsewhere, but not in the peer-reviewed literature (e.g., yoga, Pilates). Another limitation is the lack of diverse samples tested. Few tests assessed non-elite and older aged individuals, thus applicability to the general population may be questioned. In addition, test–retest reliability was lacking, as this was only displayed in four studies. Thus, one-time measures of competency seem to be an issue in the assessment of lifelong physical activities.

5 Conclusion

Lifelong physical activity movement skills may be advantageous for individuals to learn due to their individual or small group nature and as an opportunity to broaden their physical activity confidence and competence. Additionally, their need for little structure, decreased contact, varying levels of intensity and competitiveness, along with the ability to perform these activities into old age may allow individuals to be active at any age. A total of 17 studies were considered and reviewed for their methodological properties, validity, reliability, and test duration. Methodological characteristics, such as participants’ details and stability of conditions need to be better reported in future studies. While moderate to excellent levels of intra-rater and inter-rater reliability were noted in the majority of tests, few tests of lifelong physical activities reported test–retest reliability. Validity was only reported in ten of the studies; content validity was the most common. Future research should look to establish additional forms of validity and reliability for current tests of lifelong physical activities. Tests of lifelong physical activity included in this review and created in the future should look to establish predictive validity in order to support the notion that competency in lifelong activities does allow for a lifetime of activity.