Introduction

Many recent studies have focused on understanding the relationship between motor competence and health-related behaviors and attributes, such as physical activity and weight. Motor competence (MC) defines a person’s level of proficiency to execute motor skills as well as the underlying mechanisms including motor coordination and control [1, 2]. MC plays a crucial role in the motor domain and there is strong evidence that MC is positively correlated with weight status, physical activity, several facets of health-related physical fitness, and perceived motor competence [3,4,5,7]. Both cross-sectional and longitudinal data suggest that MC may be an important antecedent/consequent mechanism for promoting many aspects of health-related behaviors.

MC is considered a necessary and valuable component of the motor domain in children. It is particularly used to: (a) categorize and identify individuals who are developmentally delayed, (b) plan treatment for children who are eligible for and in need of movement intervention, (c) evaluate changes in MC over time (d) provide feedback to the performer or other concerned parties, (e) [7] predict other low-to-moderate correlations behavioral indicators like physical activity [6, 8]. Therefore, a comprehensive and accurate MC assessment is critical.

Several assessment tools have been designed to assess MC in childhood [9] and they measure either qualitative or quantitative aspects of MC and accordingly provide different types of information [10, 11]. Quantitative assessment measures the product of a motor skill performance, such as performance time, accuracy, or distance, while in qualitative assessment, the process or form of performance is of particular interest. Some of the most popular and commonly used MC batteries are the Test of Gross Motor Development (TGMD-2 & -3) [12, 13], the Bruininks-Oseretsky Test of Motor Proficiency -second edition (BOT-2 Short Form) [14], and the Körperkoordination Test Für Kinder (KTK) [15, 16]. Although these tools are used for measuring the same construct (i.e. MC), they differ in many aspects.

The TGMD-3 is a process-oriented MC tool [13] that assesses locomotor and object control skills [13]. It has diagnostic purposes and is used only in the case of motor skill performance to identify children whose scores are significantly below their age-norm motor scores. In contrast, the KTK is product-oriented and assesses gross body control and coordination, which is an important underlying mechanism of MC [15]. The BOT-2 is also a product-oriented tool that assesses both fine and gross motor skills [14]. It was originally designed to diagnose children who have mild-to-moderate motor coordination deficiency, such as developmental coordination disorder. Consequently, the TGMD-3, the BOT-2 SF, and the KTK may give different pictures of MC as they assess different aspects of movement performance and different skills [17, 18]. Since the definition of MC is rather loose, researchers have had to use various assessment methods to measure the same construct. Therefore, it is essential to compare performances based on different motor assessments to provide empirical evidence as to how various aspects of MC are related or unrelated to each other.

Although there is a growing body of research comparing MC assessment tools [17,18,19,20,21], as far as we are aware, only two studies have examined MC assessment tools across childhood [17, 18]. Fransen et al. (2014) questioned the agreement between the KTK and the BOT-2 SF in children aged 6–11. Moderately strong positive correlations were found between BOT-2 total and gross motor composite scores and KTK Motor Quotient, while BOT-2 SF fine motor composite and KTK Motor Quotient scores had weak positive correlations. Re et al. (2018) compared MC levels on the KTK and the TGMD-2 in children between 5 and 10 years of age. Low-to-moderate correlations were found between assessments across age groups with older children demonstrating higher raw scores.

However, no study has assessed the relatedness of the BOT-2 SF and the TGMD-3 and compared all three MC assessment tools (TGMD-3, KTK, and BOT-2 SF) across childhood in a single study. Since MC is age related [22], researchers and practitioners can benefit from having a better knowledge of MC performances across childhood. Therefore, this study aims to examine if the MC results, which are obtained by the TGMD-3, the KTK, and the BOT-2 SF, are related and to compare performance levels of 7- to 10-year-old school girls as our convenience sample. Middle-to-late childhood (7–10 years of age) is a sensitive period as children are expected to acquire adequate competence in most motor skills [23]; good knowledge of MC performance in this age period opens doors for appropriate treatment if necessary.

Based on the previous similar studies, we hypothesize low-to-moderate correlations between total and sub-test scores of the three assessment tools, and we expect that older children should outperform their younger peers in their total performances on the TGMD-3, the KTK, and the BOT-2 SF.

Methods

Setting and participants

This study had a cross-sectional design and included a convenience sample of 7- to 10-year-old schoolgirls from 10 public primary schools located in the south-western part of Tehran province, Iran. Schools in Iran are segregated by gender and female teachers are not normally allowed into boys’ schools and vice versa. Due to this restriction, our research focused only on MC among school girls.

Written consent from parents and assent from 164 girls aged 7–10 were obtained. The sample included children from a low-to-moderate socioeconomic background and with no reported history of learning difficulties or behavioral, physical, neurological or orthopedic problems. 9–10-year-old girls also completed a brief monthly questionnaire that asked pre-menarcheal girls about the attainment of menarche during the study; menstruating girls were excluded from the study.

Participants were grouped according to their grades (grade 1: N = 42; grade 2: N = 41; grade 3: N = 39; grade 4: N = 42) to provide a better understanding of whether different performances on instruments were linked across age groups. The ethics committee of the corresponding author’s university approved this study.

Motor competence assessments and procedure

All children completed the TGMD-3 [13], the KTK [15] and the BOT-2 SF [14] in accordance with the respective administrative manual. Each child completed each assessment in three non-consecutive school days to prevent any kind of fatigue (either physical or psychological) and loss of attention that could influence performance.

The TGMD-3 assesses 13 motor skills in 3- to 10-year-old children and is subdivided into two subscales: locomotor skills (running, galloping, hopping, skipping, horizontal jumping and sliding) and object control skills (two-hand striking a stationary ball, one-hand forehand strike of self-bounced ball, stationary dribbling, two-hand catch, kicking, overhand throwing and underhand throw) [13]. Children’s performances on the TGMD-3 were video recorded. After a researcher demonstrated the skill, children completed one practice and two formal trials. This test took around 15–20 min to administer per child [13].

Each skill was evaluated according to the checklist criteria established for the TGMD-3 [13]. A score of zero was given for each trial if a criterion was not performed. A score of one was given for each trial if a criterion was met. The locomotor and ball skill subscales yielded raw scores. The highest total score for the 2 subtests was 50 points. The internal consistency for the locomotor, ball skills subtests and the total TGMD-3 was 0.85, 0.85 and 0.91, respectively, in the Iranian sample [24].

The KTK consists of four subtests: (1) walking backward on three balance beams of equal height and length but with different widths (5 cm height; 3 m length; 6, 4.5 and 3 cm width). The maximum possible test score was 72 points based on 3 trials per each beam and a maximum of 8 points in each trial; (2) Shifting platforms—the children had two identical wooden platforms (25 cm × 25 cm, height: 5.7 cm) and after stepping on one, they had to transfer another one sideways for the next transition. The successful transitions over two 20-s trials were counted and added up; (3) hopping for height on one leg at a time over an increasing pile of foams. Three, two or one point(s) were/was awarded for successful performances on the first, second or third trials, respectively. The maximum test score was 39 points (ground level + 12 pillows) for each leg, and 78 for both legs; (4) Jumping sideways as fast as possible over a thin wooden lath (60 cm × 4 cm × 2 cm) on a jumping base (100 cm × 60 cm). The total number of jumps over two 15-s trials was counted [15]. In line with the original manual of the KTK, a movement quotient (MQ) was obtained as a total score for the fundamental movement skill assessment. For this purpose, the raw scores of each test item were converted into norm values for each test item, based on the available dataset. Subsequently, a movement quotient (MQ) was established by combining the norm values. The KTK’s transformation methods described in the original manual were used for both the conversion of the raw scores into norm values per item and the conversion of the item norm values into a combined MQ. As such, 100 and 15 points, respectively, reflect the mean and the standard deviation of the norm population [15, 25]. In this sample, the internal consistency reliability for this tool was 0.82.

The BOT-2 SF includes 14 items of BOT-2 Long Form and is divided into 2 categories: fine and gross motor tasks. Fine motor tasks include fine motor precision (drawing crooked lines and folding paper), fine motor integration (copying a square and copying a star), manual dexterity (transferring coins), and upper limb coordination (dropping and catching a ball with both hands & dribbling a ball with alternate hands). Gross motor tasks include balance (walking forward on a line and standing on one leg-balance beam, eyes open), running speed and agility (one-leg stationary hop), and bilateral coordination (jumping in place-same-side synchronized, and tapping feet and fingers-same side synchronized) and strength (push-ups and sit-ups). Each child was given a raw score for each of the 14 test items. This tool is administered in individuals 4 through 21 years of age and the time required to assess an individual is 15–20 min for the short form [14]. In this sample, internal consistency reliability for fine motor tasks (0.74), gross motor task and total sum score (0.78) were documented (0.80).

Assessments were performed in an indoor facility. The TGMD-3, the KTK, and the BOT-2 SF were administered by three trained assessors and in accordance with the manual guidelines. Assessors had a physical education background, received a detailed instruction, and participated in half-day assessment training for each assessment tool. The order of administration of the TGMD-3, the KTK, and the BOT-2 SF was the pseudo-random assignment: girls in each age group were divided randomly into three categories. One-third of each age group was assessed on the TGMD-3, while the next one-third participated in the KTK and the remainder on the BOT-2 SF. In the next administrations, the order was reversed. This method was followed in each age group. Two researchers with prior training and experience in the TGMD-3, the KTK, and the BOT-2 SF coded all of the data. Inter- and intra-rater reliabilities were obtained. For the inter-rater reliability, the results obtained by two evaluators were compared using an intra-class correlation coefficient (ICC). The ICC of each assessment tool was calculated separately with 95% confidence intervals (CIs). For intra-rater reliability, the results were compared using ICC for each assessment tool. The overall ICC showed an excellent inter-rater and intra-rater reliability for the TGMD-3 (ICC = 0.86 and 0.78 respectively), the BOT-2SF (ICC = 0.87 and 0.79, respectively), and the KTK (ICC = 0.89 and 0.88, respectively).

Statistical analysis

Descriptive statistics (means and standard deviations) were calculated by grade for the raw scores of each assessment.

Partial correlation was used to evaluate the null hypothesis that there is no significant association among performances on the TGMD-3 and its subtests, the KTK, and the BOT-2 SF and its subtests after controlling for the effect of chronological age.

A one-way analysis of variance (ANOVA) was conducted to test the hypotheses regarding the mean differences between grade levels (grades 1–4) in terms of MC tests’ total scores in 7–10-year-old girls. Prior to conducting the ANOVA, all ANOVA assumptions were evaluated and met.

The independence of cases was checked and verified. Normal distribution of errors was verified by testing for normality of residuals. Homoscedasticity of variance was checked and the assumption of equal variances was not violated.

A series of post hoc analyses (Tukey HSD) were performed to examine individual mean difference comparisons across all four grades and all total performances on MC tools if there was any.

Partial eta-squared (partial η2) was calculated. It indicates the percentage of variance in the dependent variable attributable to a particular independent variable. Cohen interpreted partial η2 between 0.01 and < 0.9, between 0.9 and < 0.25, and above and equal to 0.25 as small, medium and large, respectively [26].

The data were analyzed using IBM SPSS 21.0 statistical software and the significant level was set at p < 0.05.

Results

Table 1 shows the descriptive results (mean and standard deviation) on TGMD-3, KTK, and BOT-2 SF for the total sample and the sample divided into grade groups.

Table 1 Means and standard deviations for scores of each assessment, presented by grade

There was a statistically significant difference between groups was determined by one-way ANOVA. There were significant differences across the grade levels on the TGMD-3, F(3,160) = 5.88, p = 0.001, partial η2 = 0.09 (moderate effect size), the KTK, F(3,160) = 17.69, p < 0.001, partial η2 = 0.24 (moderate effect size), and the BOT-2 SF, F(3,160) = 24.48, p < 0.001, partial η2 = 0.31 (large effect size). Tukey HSD tests showed that 4th graders outperformed 1st, 2nd and 3rd graders on the TGMD-3. On the KTK, 1st graders had lower scores than 2nd, 3rd and 4th graders, 4th graders also had higher scores than 2nd- and 3rd grade girls. Post hoc tests also suggested that similar to the KTK, 1st-grade girls performed worse than 2nd, 3rd and 4th graders. However, 4th and 3rd graders performed better than 2nd graders on the BOT-2 SF.

Tables 2 and 3, respectively, show the Pearson correlation by grade and the partial and zero-order correlations between standard scores on each subscale and total performance of the TGMD-3, the KTK, and the BOT-2 SF. All partial correlations are lower than zero-order correlation after controlling for chronological age. Partial correlation values are low to moderate (0.04–0.46). All correlations between the fine motor composite of BOT-2SF and the other instruments’ test results are quite low. The highest correlations were found among the KTK and the BOT-2SF total score and BOT-2 SF gross motor composite (0.45 and 0.46). There is a low correlation among the TGMD-3, the KTK, and the BOT-2SF, especially when locomotor and object control skills subtests are considered separately. All Pearson correlations tend to decrease as grade increases, except the correlations among the KTK and the BOT-2 SF total score and the BOT-2 SF gross motor composite, which tend to maintain similar values across grades.

Table 2 Pearson correlation coefficients (r) among performances on TGMD-3 and its subtests, KTK, and BOT-2 Short Form and its subtests by grade
Table 3 Partial correlation coefficients controlling for chronological age and zero-order correlations among performances on TGMD-3 and its subtests, KTK, and BOT-2 Short Form and its subtests

Discussion and Conclusion

This study aimed to examine if the results obtained from the TGMD-3, the KTK and the BOT-2 SF measure common facets of MC and to compare performance levels across childhood.

The result of the study indicates that the KTK and the BOT-2 SF gross motor composite share some common MC facets while the BOT-2 SF fine motor composite and the TGMD-3 may assess different aspects of MC. In fact, zero-order and partial correlations between the KTK and the BOT-2 SF gross motor composite were moderate, (r = 0.59 and 0.46) which indicates that both tests could assess common facets or traits of MC. Contrarily, the TGMD-3 and especially the BOT-2 SF fine motor composite results demonstrate low association between each other and between the KTK and the BOT-2 SF gross motor composite, which suggests that the TGMD-3 and the BOT-2 SF fine motor composite assess different MC traits. Literature shows similar results (r = 0.44–0.64) on the relationship between the KTK and the BOT-2 SF [17]. These associations suggest that these two test batteries tend to measure some common aspects of the MC construct. Obviously, the correlation coefficient mostly depends on the nature of the tasks. In this respect, the current study provides evidence that there is a stronger association between the KTK total score and the BOT-2 SF gross motor composite than there is between the TGMD-3 and the KTK total scores, or between these two and the fine motor composite of the BOT-2 SF. These findings are in accordance with previous studies where the gross motor scales of the two test batteries were hypothesized to have better associations than gross motor scale of one battery and the fine motor scale of the other [9, 17]. The result also indicates that the TGMD-3 and fine motor composite from the BOT-2 SF are the only instruments that do not assess the same aspect of MC. Marteniuk (1974) proposed that common factors should share at least 50% of the variance, that is, they should demonstrate a correlation ≥ 0.70 [27]. Although Burton and Rodgerson (2001) argued that the criterion of 0.70 might not be appropriate [28], in fact, Cohen (1992) considered the correlation of 0.50 as high [29]. We noted that all correlations between the test batteries show differences among grades, and tend to decrease as grade increases. This result is in consonance with the factor analytic studies that found differentiation of abilities with age during childhood [30, 31]. In the present study, the correlations that decreased the most with grade were the correlation between the TGMD-3, the KTK and the BOT-2 SF gross motor composite and fine motor composite, and the correlation between the BOT-2 SF fine motor composite and the TGMD3 and the KTK. This demonstrates that the TGMD-3 and the BOT-2 SF fine motor composite evaluate different facets of MC. In fact, the TGMD-3 seeks to assess motor skills while the KTK and the BOT-2 SF intend to assess abilities. The KTK and the BOT-2 SF gross motor composite both seek to evaluate gross motor coordination, while the BOT-2 SF fine motor composite intends to evaluate fine motor coordination.

In general, in our sample, girls showed improvement in standard scores of three assessments across age groups; in TGMD-3, there were no differences among 1st to 3rd graders, which is in line with the statement of Gallahue et al. who proposed the stabilization in the performance of some fundamental motor skills beginning around the age of 6–7 [23]. However, fourth-grade girls scored higher on the TGMD-3 compared to first, second and third grade girls. The effect size estimate showed a weak effect; one possibility for this finding is that the TGMD-3 measures fundamental movement skills associated with sport participation (Ulrich, 2016), at the age of 10 (4th grade) children try to apply their motor skills in sports activities [23], therefore, they gain more experience and reach advanced levels of coordination and control which is a demand of the TGMD-3 and, therefore, they obtain higher raw scores in the TGMD-3 performance as opposed to lower graders. Another reason could be that 4th graders receive performance (or technique) instructions in their physical education classes while 1st to 3rd graders are deprived of such physical education programs in their school curriculum which is due to a shortage of physical education teachers in Iran. Evidence indicates that performance improves as children participate in physical education programs and structured movement programs and receive instruction on motor skills [20]. In the KTK, girls showed improvement in their performances across childhood. The KTK is claimed to measure gross motor coordination, a general construct underlying the development of fundamental movement skills [16, 32]. Gross motor coordination is the result of neurological maturation and practice/instruction. It could be one possible reason for the higher graders’ better score on the KTK. Similar to the KTK performances across childhood, in the BOT-2 SF girls showed improvement in their performance with age. Like Logan et al. who suggested that Movement Assessment Battery for Children-2 (MABC-2) assesses motor abilities, which are a specific aspect of MC, we do believe that the BOT-2 SF measures motor abilities as well [21]. Most of the tasks included in this battery are similar to the MABC-2 tasks and are based on abilities such as balance (e.g., walking forward on a line, standing on one leg on a balance beam, jumping on one leg), strength (e.g., knee push-ups, sit-ups), speed and accuracy (e.g., transferring pennies). Abilities differ from skills in the sense that skills are learned; several of these abilities are required in both ball skills (skill involving projecting, receiving and intercepting objects: throwing and catching) and locomotor skills (skills involving transferring body from one to another place: hopping, jumping, and galloping). Thus, tasks of the BOT-2 SF measure motor abilities that are the foundation of complex movements such as gross motor skills. An individual’s abilities are formed through biological and physiological factors [33] which are also affected by the environment. For example, physical activity and participation in sports programs will develop children’s motor abilities, which may justify age-related differences among the present sample of Iranian girls.

Although the present study is not exempt from limitations, a strength of this study is that it compares the three most commonly used MC assessments all at once in a single body of work. Comparable studies [18, 21] had only two of these tools in their research. The main limitation is the fact that this study only included girls as participants, which narrows the scope of the findings. Although the sample size is quite large, when stratified by grade, the size of the groups is small, even though the “N” is large enough for the statistics used. Another limitation is the cross-sectional design of this study, children’s MC could be better determined using a longitudinal design.

Several past studies have conducted MC tests across all ages and genders, but some MC tools have not been compared and few studies have compared multiple MC testing batteries with one another in a group of primary school children. This study correlated the TGMD-3, the KTK, and the BOT-2 SF assessments for MC and found weak-to-moderate correlations, suggesting that KTK and BOT-2 SF gross motor composite may measure the same facets of MC, and the TGMD-3 and the BOT-2 SF fine motor composite measure different ones. We also found significant performance differences across age groups, with older female groups generally outperforming their younger counterparts, a finding supported by each testing battery. These results imply that the battery tests of MC should not be used interchangeably.