FormalPara Key Points

Practitioners may attach inertial measurement units to the foot, tibia and lumbar spine to accurately and reliably derive stride variables during running.

The computational method for gait event detection may be more critical to validity and reliability than the attachment location itself.

These findings may open opportunities for practitioners to use inertial measurement units to analyse the gait patterns of athletes in a variety of running-based sports.

1 Introduction

It is common practice to quantify the activities performed by athletes, or external load, to plan and monitor training and competition load [1]. Tracking technology, such as video-based systems, global positioning systems (GPS) and local positioning systems (LPS), measure athlete displacement and calculate velocity and acceleration [2,3,4,5]. However, due to its low sampling frequency (e.g. 10 Hz), GPS is limited in its ability to accurately capture changes in velocity or high-speed movements over short distances and when movements are nonlinear, such as changes of direction [6,7,8]. Although LPS (1000 Hz) sample at a higher rate than GPS, neither technology can account for non-locomotor activity, such as impacts or collisions [9,10,11,12]. To overcome some of the limitations of GPS and LPS, inertial measurement units (IMUs), comprising accelerometers, gyroscopes and magnetometers, can provide additional information on athlete activity profiles [13, 14].

Triaxial accelerometers measure acceleration in the anteroposterior, mediolateral and vertical axes and typically capture data between 100 and 1000 Hz [13, 15, 16]. Gyroscopes and magnetometers measure device orientation and direction, respectively [17]. Accelerometers have been used for quantifying daily physical activity and estimating energy expenditure [18,19,20,21,22] and their use is now common in athletes [11, 12, 23,24,25]. Accelerometer-derived metrics, such as PlayerLoad™, provide an indication of the global external load from the summation of instantaneous rate of change of acceleration in the anteroposterior, mediolateral and vertical axes [15, 26]. However, PlayerLoad™ is a relatively gross measure that does not offer insight into discrete movements, such as stride variables. Instead, patterns in the signals of IMUs can be explored to identify foot contacts to calculate different stride variables, which may help in understanding the way in which athletes produce a given load [27, 28].

The detection of gait events, such as initial contact (IC) and toe-off (TO), is possible using accelerometer and gyroscope data [29,30,31]. Identifying these key events allows for the calculation of spatiotemporal parameters, including contact time, flight time, step and stride times, step frequency and step and stride lengths [30, 32,33,34,35]. The acceleration signal from IMUs may also be used to estimate ground reaction forces (GRFs) and vertical stiffness to describe the impact forces experienced by athletes and their ability to absorb force during running [36,37,38,39,40]. Deriving stride variables is important for evaluating an athlete’s gait pattern and may help to inform injury mitigation and performance enhancement strategies [41]. However, device placement may influence the derived outcome measures and should be considered when using IMUs to capture stride data [42, 43].

Placement of IMUs for analysis of running gait can vary between the foot [30, 34, 44], distal and mid tibia [13, 31, 40], lumbosacral region [30, 32, 33] or thoracic spine [37, 39, 43]. Given accelerometers measure acceleration of the segment to which they are attached, there are some potential issues associated with placement on the upper body to measure accelerations occurring at the lower limb and derive valid and reliable stride data [42]. Attachment location is an important consideration due to signal attenuation, whereby acceleration magnitudes dissipate from the foot to the torso during ground contact in running [45,46,47]. Although securing IMUs to the foot may provide the most accurate derivations of stride variables [16, 34, 35], this site may not be practical in some sports (such as those that involve kicking), while other work has noted the potential for injury in contact sports using IMUs attached to the tibia [44]. Given IMUs have been utilised at various sites for the analysis of running gait in the literature, it is important to understand if IMU placement affects the derived outcome measures. This may help inform practitioners which attachment location is most appropriate for deriving valid and reliable stride data based on the constraints of the sport they work in. Therefore, the aim of this systematic review and meta-analysis is to report on the validity and reliability of inertial sensors to calculate spatiotemporal variables, GRF and vertical stiffness during running with respect to sensor placement.

2 Methods

2.1 Systematic Review Protocol

The protocol for this systematic review was registered on PROSPERO and can be accessed at https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42020160325. All procedures were performed in accordance with the PRISMA guidelines [48].

2.2 Eligibility Criteria

Articles were eligible for inclusion in this systematic review if they (1) were published in English (2) used an accelerometer, gyroscope or a combination of both technologies (3) had participants jog, run or sprint during data collection (4) reported at least one of the following outcome variables: IC, TO, contact time, flight time, step time, stride time, swing time, step frequency, step length, stride length, peak vertical or resultant GRF or vertical stiffness and (5) assessed validity or reliability.

2.3 Search Strategy

Keywords in the title and abstract of records, combined with the relevant subject heading terms, such as Medical Subject Headings (MeSH), were systematically searched in SPORTDiscus with Full Text, CINAHL Complete, MEDLINE (EBSCOhost), EMBASE (Ovid) and Scopus from the earliest record up until 6 August 2020. The following keyword search string was used in each electronic database (which is also detailed in Supplementary Information Appendix S1, Table S1):

(jog* OR run* OR sprint*) AND (acceleromet* OR "global positioning system" OR GPS OR gyroscope* OR IMU OR inertial* OR microtechnolog* OR “wearable sensor”) AND (acceleration* OR event* OR fatigue* OR force* OR GRF OR kinematic* OR kinetic* OR parameter* OR reliab* OR stance OR step* OR stiff* OR stride* OR strike* OR temporal OR valid*).

2.4 Study Selection

Search results were exported to reference management software EndNote X9.3.3 (Clarivate Analytics, Philadelphia, USA) where duplicates were removed. Two authors (BJH and PJT) then independently screened the title and abstract of each record in the Rayyan web-based systematic review tool (available at http://www.rayyan.qcri.org). The full text of potentially eligible articles was retrieved and one author (BJH) performed a final eligibility assessment, which was later checked by a second author (PJT). Discrepancies in article selection were resolved by a third author (SJC). The reference lists of all retrieved articles were also examined to determine any other articles that may be relevant to the review.

2.5 Data Extraction

Data relating to participant characteristics (age, body mass, height and activity level), sensor specifications (brand, model, range and sampling frequency), sensor location (foot, distal/mid tibia, hip, sacrum, lumbar and thoracic spine), criterion used for validity (brand, model and sampling frequency), running activity performed (number, duration or distance of runs, velocity), outcome variables analysed (temporal, spatial, GRF and vertical stiffness) and measures of validity and reliability were extracted from each included study. Definitions for the variables analysed in this review are presented in Table 1. Running velocity, temporal and spatial variables and GRF are reported in metres per second (m·s−1), milliseconds (ms), centimetres (cm) and Newtons (N), respectively. Where included studies did not report results in the aforementioned units, values were converted to enable better comparison between studies.

Table 1 Definitions of stride variables

2.6 Assessment of Methodological Quality

The methodological quality of each included study was assessed using a modified assessment scale of Downs and Black [49]. Of the 27 criteria, the most relevant to the study designs included in this review were applied, which is consistent with other reviews [50, 51]. Each study was therefore assessed for quality of reporting (1–4, 6, 7 and 10), external validity (11 and 12) and internal validity bias (16, 18 and 20) based on 12 criteria. The criteria were evaluated as yes, no or unclear, with the score out of 12 determined from the number of items that were answered yes.

2.7 Data Analysis

The values of validity and reliability for each stride variable are presented in the tables below and included throughout Sect. 3.4.

2.7.1 Meta-analysis

Meta-analyses were performed when there were at least two studies that (1) reported means and standard deviation (SD) for stride variables calculated from IMUs and reference systems (2) used the same IMU attachment site and (3) assessed validity at a comparable running velocity (≤ 1 m·s−1 difference). Authors that did not include absolute mean ± SD values for the computed stride variables were contacted to gain the additional data. Raw outcome data were not obtained for 22 studies and were thereby ineligible for inclusion in any meta-analysis [31, 32, 34, 37, 38, 40, 43, 52,53,54,55,56,57,58,59,60,61,62,63,64,65,66]. Where there were multiple effects reported for different running velocities from a single study, data were aggregated so only a single effect was included in the meta-analysis [67]. However, when validity was assessed using IMUs from two different manufacturers [35] or criterion measures [29, 68] in a single study, effects were treated independently and both were included in the meta-analysis. Data pertaining to criterion validity were pooled from studies that used different reference measurement systems. Specifically, effects were pooled from studies that used motion capture [29], force plates [30, 69, 70], high-speed camera [16, 35, 68] and photocell systems [68, 71, 72]. This approach was used due to the limited number of studies with comparable methodologies and previous work demonstrating that optical timing and motion capture systems and force plate systems are all considered as criterion methods for gait analysis [73,74,75].

Where there were sufficient data to group effects based on eligibility criteria, meta-analyses were performed using random-effects models with the Meta statistical package in R software (version 3.6.3, R Foundation for Statistical Computing) to produce a pooled estimate of the mean difference (MD) in absolute units [76]. When studies could be pooled based on different IMU attachment sites for the same variable, subgroup analysis was performed to test whether placement differs in terms of their effects, with the significance level set at p < 0.05 [77].

The level of statistical heterogeneity was quantified by calculating the I2 statistic [78]. Statistical heterogeneity was considered low (I2 < 25%), moderate (I2 = 25–49%) or high (I2 > 50%) [78]. When I2 was high (I2 > 50%), leave-one-out analysis was performed to determine the studies that contributed most to heterogeneity and had a high influence on the overall effect [79]. Moderator analysis was also conducted to determine how much the criterion measure contributed to the observed variability of effect sizes between studies [80]. Where the criterion does not have a significant moderating effect, heterogeneity may be attributable to an unidentified source [80]. A meta-regression model was applied to the moderator analysis using the metareg function in R software [81]. Statistical significance was set at p < 0.05.

Effect sizes and their respective confidence intervals (CI), along with the overall MD for pooled effects, were visualised as forest plots [82]. In forest plots, studies are represented by a point estimate, bounded by a 95% CI for the effect [82]. The summary effect (MD) is symbolised by the polygon at the bottom of the plot [82]. The width of the polygon indicates the 95% CI. Studies that exhibit larger squares contribute more to the summary effect (MD) compared to studies with smaller squares [82].

3 Results

3.1 Study Identification and Selection

A total of 4,654 records were identified through the database searches. An additional three articles were included through reference list searches. Following deduplication, title and abstract screening and a thorough full-text screen of each record, 39 studies met the eligibility criteria and were included in the review [16, 29,30,31,32,33,34,35,36,37,38,39,40, 43, 52,53,54,55,56,57,58,59,60,61,62,63,64,65,66, 68,69,70,71,72, 83,84,85,86,87]. An outline of this process using the PRISMA flow diagram is presented in Fig. 1.

Fig. 1
figure 1

Flow chart of study selection process

3.2 Study Characteristics

A summary of the characteristics of each study is presented in Table 2. A total of 657 participants were included across 39 studies (mean ± SD 16.8 ± 10.2), where the populations sampled included healthy active adults (n = 15 studies), recreational/amateur (n = 12) and high-level runners (n = 5), team-sport athletes (n = 6), elite track and field athletes (n = 1) and triathletes (n = 1). Sensor placement varied between foot [16, 30, 34, 35, 52, 56, 59, 60, 65, 69, 71, 85], distal and mid tibia [29, 31, 40, 58, 60, 62, 69, 70, 84], hip [66], sacrum [32, 57], lumbar spine [30, 33, 38, 64, 68, 69, 72, 83], torso [53] and thoracic spine [36,37,38,39, 43, 54, 61, 86]. Two studies used multiple sensors and a combination of placements to derive stride variables [55, 87]. Validity was assessed using force plate systems (n = 17) [31, 33, 36, 38,39,40, 43, 54, 55, 57,58,59, 61, 62, 66, 69, 70], optical motion capture (n = 7) [29, 32, 52, 55, 64, 65, 85], instrumented treadmill (n = 7) [30, 34, 37, 53, 56, 60, 87], high-speed camera (n = 4) [16, 33, 35, 68], photocell systems (n = 3) [68, 71, 72], foot-mounted accelerometer (n = 1) [83], in-shoe piezo-electric force sensitive resistors (FSRs) (n = 1) [63] and different stride time calculation methods (n = 1) [84] as criterions. Reliability was assessed in nine studies [16, 38, 40, 43, 59, 68, 71, 83, 86]. Contact time was the most commonly reported variable (n = 16) [16, 29, 30, 32,33,34,35, 37, 52, 53, 62, 68,69,70,71, 83], while six studies derived spatial data (step length and stride length) from accelerometers and gyroscopes [35, 52, 65, 71, 72, 85]. Eleven studies estimated peak vertical and resultant GRF [36, 38,39,40, 43, 55,56,57, 61, 66, 87], whereas three studies used accelerometers to derive vertical stiffness [37, 38, 86].

Table 2 Study characteristics for the 39 studies included in the review

3.3 Methodological Quality

Based on the number of criteria that were answered yes, the methodological quality of included studies ranged from 7 to 10 out of 12, with a mean score of 9 out of 12 (see Supplementary Information Appendix S1, Table S2). Out of the 39 studies, 24 did not include p-values alongside validity or reliability outcomes [29,30,31,32,33,34, 37,38,39,40, 53, 55, 56, 58, 59, 61, 63,64,65,66, 72, 83, 84, 86], two studies did not clearly report subject characteristics [33, 63], while another study did not provide a description of the running protocol used for assessing validity [60]. Five studies scored a yes for detailing the source population from which subjects were recruited [31, 60, 83, 86, 87], whereas this was unclear in the remaining studies.

3.4 Stride Variables

The results for each stride variable examined in this review are described in the following sections.

3.4.1 Initial Contact

Validity outcomes for the detection of IC using IMUs secured to the foot, distal and mid tibia and lumbar spine are presented in Table 3. Mean relative differences (− 16.0 to 3.3 ms) and estimation errors (− 6.0 to 4.3 ms) were generally low for foot placement [30, 52, 69], while another study reported IC could be detected with a precision (median ± inter-quartile range [IQR]) of 2.0 ± 1.0 ms from a foot-mounted IMU [34]. Contrasting results were evident for placement on the tibia. Using only the angular velocity signal from a gyroscope, errors were as high as 64.2 ms compared to motion capture in one study [29], while another study detected IC from gyroscope data with an absolute mean error of 13.0 ± 6.0 ms to that of a force plate [70]. The mean relative difference (− 38.0 ± 10.7 ms) was greater than that observed for the foot using tibial acceleration data [69], while other studies showed improved validity for determining IC from tibia-mounted IMUs compared to force plate measures (MD = − 0.5 ± 0.3 ms, mean bias = − 2.3 ± 4.7 ms, mean error = 1.68 ms) (see Table 3) [31, 58, 62]. In another study using tibia-mounted accelerometers, IC was detected with an accuracy of F1 = 0.92–0.96 compared to those events determined from in-shoe piezo-electric FSRs [63]. The F1 score is a measure of a test’s accuracy, where an F1 score of 1 reflects perfect precision and recall [88]. Detection of IC was slightly earlier (4.7 ms) at 3.3 m·s−1 from a lumbar spine-mounted IMU compared to the foot, but 2.4 ms slower at 4.1 m·s−1 [69]. The largest difference from force plate-identified IC was 53.0 ms for the lumbar spine [30].

Table 3 Validity summary statistics for initial contact

3.4.2 Toe-off

Table 4 documents the validity statistics from studies that determined the accuracy of IMUs to detect TO. Between 2.2 and 4.1 m·s−1, the mean relative difference and estimation errors for the detection of TO from foot-mounted IMUs ranged from − 53.8 to 32.0 ms and − 4.3 to 16.3 ms, respectively [30, 52, 69]. Errors up to − 32.4 ms were shown using a gyroscope attached to the tibia [29], while another study using angular velocity data from the tibia showed TO was determined after force plate detection (absolute mean error > 23.0 ms) [70]. Smaller mean absolute and relative differences were observed for determining TO from tibial acceleration data (< 8.8 ms and < 1.0 ms, respectively) [69], while TO was detected with an accuracy of F1 = 0.77–0.86 from accelerometers secured to the distal tibia when in-shoe piezo-electric FSRs were the criterion [63]. A time lag of 7.6–24.0 ms was present for the detection of TO from an IMU secured to the lumbar spine compared to values obtained from a force plate [30, 69].

Table 4 Validity summary statistics for toe-off

3.4.3 Contact Time

Validity outcomes reported from studies using placement on the foot, tibia, lumbar spine, torso and thoracic spine to derive contact time are presented in Table 5. The concurrent validity of an IMU fixed to the foot showed a deviation to high-speed camera measures between − 3.3 and − 0.1%, a mean bias between − 5.6 and 0.4 ms and intraclass correlation coefficient (ICC) values as high as 0.97 for contact time across velocities of 4.3 ± 0.7 m·s−1, 6.2 ± 0.7 m·s−1 and 8.0 ± 1.6 m·s−1 [16]. When a photocell system was the criterion, ICC values were as low as 0.1 at 5.6 m·s−1 using a foot placement [71]. Pearson correlation analysis showed a large agreement (r = 0.96) between a tibial accelerometer estimate of contact time and force plate [62], whereas contrasting results were evident for contact time calculated from gyroscope data (see Table 5) [29, 70]. True error and ICC outcomes were > 63.4 ms and < 0.32, respectively, compared to motion capture [29], whereas differences to force plate were smaller (> − 12.0 ms) in another study using angular velocity data to determine contact time [70]. Compared to motion capture and force plate, small biases (0.8–1.1 ms) and estimation errors (5.0 ms) were shown for contact time when an IMU was placed on the sacrum and lumbar spine, respectively [32, 33]. However, significant differences (p < 0.05) were reported in another study using the lumbar spine when photocell (> − 35.0%) and high-speed camera (> − 31.0%) measures of contact time were used as the reference [68]. In a study comparing contact times derived from different accelerometer attachment sites, the lumbar spine showed a smaller difference from force plate-determined contact time (< 8.7%) to the values obtained from the tibia (< 17.3%) and foot (< 26.6%), with each site significantly correlated (r > 0.74, p < 0.05) with force plate (see Table 5) [69]. Similar results were reported in a more recent study showing the mean lumbar spine-force plate difference (− 29.0 ms) was less than that observed between foot-force plate (47.0 ms). In that study, accelerometers placed on the lumbar spine underestimated mean contact time compared to the force plate, whereas foot acceleration overestimated by 18.0 ms [30]. Contact time derived from an accelerometer secured to the thoracic spine showed a mean bias of − 10.4% and a nearly perfect correlation (r = 0.98) with an instrumented treadmill [37]. However, data from only one participant were analysed [37].

Table 5 Validity summary statistics for contact time

Three studies assessed the reliability of IMUs on the foot and lumbar spine to calculate contact time (see Table 6). The coefficient of variation (CV) was < 2.3% across velocities ranging between 2.2 and 5.6 m·s−1, while the standard error of measurement (SEM) was highest at 2.2 m·s−1 (5.0 ms) [71]. Good absolute and relative between-trial reliability was established using an accelerometer mounted on the lumbar spine (CV < 9.9%, ICC > 0.88) [68], while lower ICC values ranging from − 0.24 to 0.67 were reported for inter-day reliability in another study using a lumbar spine-mounted accelerometer [83]. Greater SEM values were observed for lumbar spine-determined contact time (> 10.1 ms) compared to foot placement (< 5.0 ms) [71, 83].

Table 6 Reliability summary statistics for each analysed stride variable

Data collected between 3.3 and 4.3 m·s−1 were pooled to determine the effect of IMU placement on the accuracy of contact time compared to criterion measures (see Fig. 2). There was a significant difference in the overall effect of different IMU attachment sites (p = 0.02). Contact time derived from the foot (MD [95% CI] − 11.47 ms [− 45.68, 22.74], p = 0.43), tibia (MD [95% CI] 22.34 ms [− 18.59, 63.27], p = 0.18) and lumbar spine (MD [95% CI] − 48.74 ms [− 120.33, 22.85], p = 0.12) was not significantly different to the criterion. All subgroups were associated with high heterogeneity (I2 > 54.1%). Leave-one-out analysis (see Supplementary Information Appendix S2) for foot and lumbar spine sites revealed that there was no single study influential enough to substantially change the overall heterogeneity (I2 > 83.4%) or pooled MD. In contrast, heterogeneity could be explained for the tibia site by omitting one study [70] (I2 = 0%), with the same study also having an influential effect on the overall result for tibia-determined contact time (MD [95% CI] 34.68 ms [11.16, 58.19], p = 0.02). Moderator analysis showed the type of criterion measure was not significantly associated with the observed variance in effect sizes (p = 0.15).

Fig. 2
figure 2

Forest plot displaying the effect of contact time (ms) calculated from IMUs worn on the foot, tibia and lumbar spine between 3.3 and 4.3 m·s−1. Data are presented as means and SD of IMU- and criterion-derived contact time. Data from the RunScribe™ sensor are shown in Garcia-Pinillos et al. (2019a), while data from the Stryd™ device are shown in Garcia-Pinillos et al. (2019b). Two different motion capture algorithms were used as criterions for McGrath et al. (2012a, b). Gindre et al. (2016a, b) is represented by high-speed camera and photocell system criterions, respectively. CI confidence interval, df degrees of freedom, IMU inertial measurement unit, IV instrumental variable, over overestimation, SD standard deviation, under underestimation

3.4.4 Flight Time

Results from studies reporting the criterion validity of IMU-derived flight time are documented in Table 7. For placement at the foot, ICC values were as high as 0.81 at 5.6 m·s−1 and 0.86 at 3.3 m·s−1 compared to photocell and high-speed camera measures of flight time [35, 71]. Low estimation errors (< 8.2 ms) and median ± IQR bias (15.0 ± 12.0 ms) and precision (5.0 ± 3.0 ms) were reported for foot-determined flight time versus motion capture and instrumented treadmill values, respectively [34, 52]. There was a significant difference (p < 0.05) from high-speed camera and photocell system criterions when a lumbar spine placement was used to calculate flight time across a range of velocities (3.3–5.8 m·s−1; 41.0–103%) (see Table 7) [68], while the bias was − 25.8% for thoracic spine-determined flight time in another study using an instrumented treadmill as the reference [37]. The observed difference for lumbar and thoracic spine sites was greater than that of a foot placement (< 15.1%) [35].

Table 7 Validity summary statistics for flight time

For reliability (see Table 6), the CV was as high as 11.6% at 2.2 m·s−1 for flight time derived from an IMU on the foot [71], while CV values were < 5.2% between trials using a lumbar spine-mounted accelerometer [68].

Outcome data between 3.3 and 4.2 m·s−1 were pooled from two studies [35, 71] to perform a meta-analysis assessing the effect of foot-determined flight time (I2 = 59%; see Fig. 3). Meta-analysis demonstrated that foot-determined flight time is not significantly different to reference measures (MD [95% CI] 11.93 ms [− 8.88, 32.74], p = 0.13). Leave-one-out and moderator analyses were not performed due to only two studies in the meta-analysis.

Fig. 3
figure 3

Forest plot displaying the effect of flight time (ms) calculated from IMUs worn on the foot between 3.3 and 4.2 m·s−1. Data are presented as means and SD of IMU- and criterion-derived flight time. Data from the RunScribe™ sensor are shown in Garcia-Pinillos et al. (2019a), while data from the Stryd™ device are shown in Garcia-Pinillos et al. (2019b). CI confidence interval, df degrees of freedom, IMU inertial measurement unit, IV instrumental variable, over overestimation, SD standard deviation, under underestimation

3.4.5 Step Time

Validity outcomes from two studies that calculated step time are presented in Table 8. Compared to values obtained from an instrumented treadmill, step time determined from a foot-worn IMU was shown to have a perfect agreement and a median ± IQR precision of 3.0 ± 2.0 ms across velocities ranging from 2.8 to 5.6 m·s−1 [34]. The mean bias for step time calculated from a sacrum-worn accelerometer ranged from − 1.3 to − 0.4 ms across velocities ranging between 2.8 and 5.2 m·s−1, showing a marginal underestimation of step time compared to measures derived from a motion capture system [32]. Sacrum-determined step time was most strongly correlated with motion capture at 2.8–3.3 m·s−1 (r = 0.93) [32].

Table 8 Validity summary statistics for step time and stride time

3.4.6 Stride Time

Validity outcomes for IMU-determined stride time are outlined in Table 8. Stride time was calculated from IMUs worn on the foot [52], tibia [29, 84], sacrum [32], lumbar spine [33] and thoracic spine [54]. There was no significant difference (p = 0.92) between foot-worn IMU and motion capture calculations of stride time, where the mean error ranged from − 4.0 ± 24.0 ms at 2.2 m·s−1 to 0.3 ± 22.1 ms at 3.1 m·s−1 [52]. Comparison between different stride time calculation methods using tibial accelerometry showed ICC values were > 0.95 [84], while in another study using tibia-mounted IMUs, ICC values ranged between 0.55 and 0.83 using two motion capture methods (see Table 8) [29]. Stride time derived from the sacrum and lumbar spine showed low errors (standard error < 0.8 ms, mean estimation error < 5.0 ms) compared to motion capture, force plate and high-speed camera measures, respectively [32, 33]. However, when an accelerometer was attached to the thoracic spine, there was a significant bias of − 26.0 ms (p = 0.00) compared to force plate stride time [54], which is greater than the bias reported for the sacrum (− 1.0 to 1.2 ms) [32].

One study (see Table 6) established the reliability of accelerometer-derived stride time across different sampling frequencies [59]. The CV of stride time was < 3.5% for accelerometer signals between 100 and 1000 Hz [59].

3.4.7 Swing Time

Only two studies, each using different attachment sites, reported the validity of IMUs to derive swing time (see Table 9). Swing time calculated from a foot-worn IMU was shown to have a median ± IQR bias of 15.0 ± 12.0 ms and a median ± IQR precision of 5.0 ± 2.0 ms compared to values obtained from an instrumented treadmill [34]. Swing time, derived from the angular velocity signal about the y-axis from a tibia-mounted gyroscope, showed poor to moderate agreement (ICC < 0.38) when two established motion capture methods were used as criterion measures [29, 89, 90].

Table 9 Validity summary statistics for swing time

3.4.8 Step Frequency

Six studies quantified step frequency from foot-, tibia- and lumbar spine-worn IMUs, with reliability and validity values from each study presented in Tables 6 and 10, respectively. Foot-determined step frequency was nearly perfectly correlated (ICC > 0.95) with photocell and high-speed camera measures across a range of velocities (2.2 to 5.6 m·s−1) [35, 71]. Biases were small (< 4.5 step·min−1) and correlations exhibited close to perfect agreement (r > 0.96, p < 0.001) with an instrumented treadmill in one study that used IMUs from five different manufacturers on the foot, heel and distal tibia (see Table 10) [60]. However, the authors did not report running velocity during the trials [60]. The difference between step frequency derived from foot- and lumbar spine-worn IMUs and high-speed camera and photocell systems ranged between − 0.9 and 0.8% [35, 68], while another study that directly compared values obtained from a lumbar spine-worn accelerometer to a foot-mounted accelerometer during the same run protocol deemed validity as "good” (ICC = 0.78–0.90) between 2.8 and 3.9 m·s−1 [83]. Maximal sprinting (6.8 ± 1.0 m·s−1) resulted in a bias ranging between − 25.9 and − 6.5 step·min−1 for step frequency derived from an IMU on the lumbar spine [72].

Table 10 Validity summary statistics for step frequency

Reliability (see Table 6) was established for foot-determined step frequency, where the CV and SEM ranged between 1.1 to 2.0% and 1.7 to 2.8 step·min−1, respectively, across velocities (2.2–5.6 m·s−1) [71]. The ICC values representing the reliability of lumbar spine-determined step frequency were > 0.78 [68, 83].

Data collected between 3.3 and 4.2 m·s−1 were grouped to produce a pooled validity estimate for foot- and lumbar spine-determined step frequency (see Fig. 4). There was no significant difference between foot and lumbar spine estimates of step frequency (p = 0.20). Derivations of step frequency from the foot (MD [95% CI] 0.45 step·min−1 [− 1.75, 2.66], p = 0.47) and lumbar spine (MD [95% CI] − 3.45 step·min−1 [− 16.28, 9.39], p = 0.37) were shown to not be significantly different to the criterion. As there were only two studies in each subgroup, leave-one-out and moderator analyses were not performed.

Fig. 4
figure 4

Forest plot displaying the effect of step frequency (step·min-1) calculated from IMUs worn on the foot and lumbar spine between 3.3 and 4.2 m·s−1. Data are presented as means and SD of IMU- and criterion-derived step frequency. Data from the RunScribe™ sensor are shown in Garcia-Pinillos et al. (2019a), while data from the Stryd™ device are shown in Garcia-Pinillos et al. (2019b). Gindre et al. (2016a, b) is represented by high-speed camera and photocell system criterions, respectively, where the authors reported the same values for each. CI confidence interval, df degrees of freedom, IMU inertial measurement unit, IV instrumental variable, over overestimation, SD standard deviation, under underestimation

3.4.9 Step Length

The validity of foot-mounted IMUs to quantify step length during running at different velocities (2.2 to 5.6 m·s−1) was investigated in three studies (see Table 11). Pearson’s correlation and ICCs showed step length, calculated from Stryd™ and RunScribe™ devices, was nearly perfectly correlated (r > 0.93, p < 0.001) with photocell and high-speed camera measures across all velocities [35, 71]. One study used placement on the lumbar spine and showed that biases increased and ICC values decreased with jogging (bias = 8.1–12.2 cm; ICC = 0.90–0.94) to sprinting (bias = 11.5–28.4 cm; ICC = 0.79–0.85) compared to a photocell system [72].

Table 11 Validity summary statistics for step length and stride length

One study assessed the reliability of step length derived from a foot-mounted IMU (see Table 6), which showed the CV ranged from 1.1 to 2.1% across all velocities (2.2 to 5.6 m·s−1), while the SEM was highest at 5 m·s−1 (241.2 cm) [71].

Data collected between 3.3 and 4.2 m·s−1 were grouped to produce a pooled validity estimate for foot-determined step length. Results from the meta-analysis are presented in Fig. 5 and show that IMUs worn on the foot produce step length values that are not significantly different to reference measures (MD [95% CI] 0.21 cm [− 1.76, 2.18], p = 0.69). No moderator analysis was performed due to I2 = 0%.

Fig. 5
figure 5

Forest plot displaying the effect of step length (cm) calculated from IMUs worn on the foot between 3.3 and 4.2 m·s−1. Data are presented as means and SD of IMU- and criterion-derived flight time. Data from the RunScribe™ sensor is shown in Garcia-Pinillos et al. (2019a), while data from the Stryd™ device is shown in Garcia-Pinillos et al. (2019b). CI confidence interval, df degrees of freedom, IMU inertial measurement unit, IV instrumental variable, over overestimation, SD standard deviation, under underestimation

3.4.10 Stride Length

Three studies determined the validity of foot-mounted IMUs to calculate stride length, where summary statistics from each study are documented in Table 11. Compared to motion capture, the mean error of IMU-derived stride length ranged from − 0.5 to 46.0 cm [52, 65, 85]. The agreement between stride length determined from an IMU and motion capture system was improved during overground runs over 10 m (3.6 ± 0.3 m·s−1; root mean square error [RMSE] = 8.3 cm) compared to running on a treadmill for 3 min at different velocities (2.2–3.1 m·s−1; RMSE = 59.2–70.2 cm, r = 0.96, p < 0.001) [52, 85]. In a study comparing four different algorithms for computing stride length from IMU signals to a motion capture system, results showed that an algorithm based on foot trajectory performed best (mean error = 2.0 ± 14.1 cm, mean percentage error = 2.8%) than those based on stride time (mean error = 17.7 ± 57.3 cm, mean percentage error = 17.1%), foot acceleration (mean error = − 0.5 ± 25.6 cm, mean percentage error = 7.9%) and deep learning (mean error = 2.5 ± 20.1 cm, mean percentage error = 5.9%) across a range of velocities up to 5.0 m·s−1 (see Table 11) [65].

The CV for within-subject variation of stride length across different sampling frequencies ranged from 4.9% at 1000 Hz to 7.8% at 100 Hz (see Table 6) [59].

3.4.11 Ground Reaction Force

The outcomes for the 11 studies that investigated the validity of IMUs to estimate GRF are presented in Table 12. Two studies applied a neural network model to accelerometer data from the foot and thoracic spine to predict vertical and resultant GRF, respectively [56, 61]. The RMSE for vertical GRF determined from foot acceleration data was < 10.5 N compared to values obtained from an instrumented treadmill, while the mean signal cross-correlation was 0.99 when the entire vertical GRF waveform was evaluated [56]. A neural network method predicted resultant GRF from accelerometers worn on the thoracic spine with a mean coefficient of determination (r2) value of 0.9 [61]. Attaching an accelerometer to the tibia [40] and hip [66] resulted in mean differences to the force plate of 400.0 N and 106.4 N (~ 8.3%), respectively, for vertical GRF, whereas biases were smaller for the vertical (− 34.1 N) and resultant (− 29.7 N) components of peak force when an IMU was attached to the sacrum (see Table 12) [57]. One study that used a spring-mass model to calculate peak vertical force showed strong correlations between force plate-lumbar spine (r = 0.81) and force plate-thoracic spine (r = 0.79), while the CV was 9.2 and 9.6%, respectively [38]. When acceleration values were converted to Newtons by multiplying by body mass, larger measurement errors and weaker correlations were reported for both vertical (CV = 16.2%, r = 0.44, p < 0.01) and resultant GRF (CV = 16.4%) using a thoracic spine accelerometer [36, 43]. During slow (2 m·s−1) to moderate (5 m·s−1) speed running in another study, a single thoracic spine-mounted accelerometer was shown to be inadequate (RMSE > 509.2 N) for use with a mass-spring damper model to predict resultant GRF waveforms [39]. When multiple IMUs were used to estimate vertical GRF, the RMSE was 220.8 ± 45.7 N, while the root mean square deviation was 241.4 ± 59.6 N [55, 87].

Table 12 Validity summary statistics for ground reaction force

The reliability of accelerometers to estimate vertical GRF was examined in four studies (see Table 6). For placement on the tibia, the SEM was 99.8 N (7.0%), whereas the minimal detectable change (MDC) was 276.7 N (19.3%) [40]. As with placement on the tibia (ICC = 0.88), lumbar spine (CV = 4.2%) and thoracic spine (CV = 3.3%) sites also showed reliable outcomes for vertical GRF derived from a spring-mass model during a continuous 2 min shuttle run [38]. However, when the same model was applied in another study using thoracic spine accelerometers, the authors classed the between-day typical error (TE; 0.8 N) and ICC (0.47) values as moderate [86]. Poor reliability was exhibited in a further study utilising accelerometers placed on the thoracic spine, whereby CV values were > 17.8% across velocities ranging between 3.3 and 6.7 m·s−1 [43].

Two studies reported mean ± SD values for thoracic spine-derived peak resultant GRF [36, 39]. However, as one study had an SD that was nearly as large as the mean [39], which suggests the data were not normally distributed and, therefore, not meeting the assumptions for a random-effects meta-analysis [91], these studies were not pooled.

3.4.12 Vertical Stiffness

Three studies examined the reliability and validity of accelerometers placed at the lumbar and thoracic spine to calculate vertical stiffness (see Tables 6and 13, respectively). A nearly perfect correlation (r = 0.98) between thoracic spine-determined vertical stiffness and that obtained from an instrumented treadmill was reported from a single participant in one study [37]. When a larger sample of participants was analysed in another study, correlations with force plate were not as strong for lumbar spine (r = 0.65) and thoracic spine (r = 0.66) estimates of vertical stiffness [38].

Table 13 Validity summary statistics for vertical stiffness

Inter-day reliability results were comparable between accelerometer placements, with a CV between 9.5 and 12.1% and ICC values 0.70–0.75 for both the lumbar and thoracic spine (see Table 6) [38, 86].

4 Discussion

This systematic review and meta-analysis summarises the validity and reliability of IMUs to derive spatiotemporal features of running gait and estimate peak GRF and vertical stiffness based on different attachment sites. Twelve variables were analysed across 39 studies, where the placement of IMUs varied between the foot, distal and mid tibia, hip, sacrum, lumbar spine, torso and thoracic spine. The results from reviewed studies and meta-analysis suggest it is possible to obtain valid and reliable stride data using IMUs attached at different sites. It appears that accuracy may depend more on the computational method used for identifying stride events (IC and TO) from inertial data rather than the attachment site itself.

Meta-analysis revealed that contact time and step frequency derived from IMUs placed at the foot, tibia and lumbar spine do not significantly differ to the criterion. However, some of these pooled analyses demonstrated high between-study heterogeneity (I2 > 54.1%), which could not be explained by differing criterion methods, nor by omitting one study for the foot and lumbar spine subgroups. Subsequently, the source of heterogeneity remains unclear for these sites but could be due to other methodological factors such as the type of sensor, sampling rate, or computational method for identifying stride events. These potential moderating variables could not be investigated further due to insufficient reporting of data within those studies. Although there were no influential studies for the foot and lumbar spine subgroups, the pooled MD for contact time determined from the tibia was distorted when one study [70] was omitted. Removal of this study from the meta-analysis resulted in an overall effect that was significantly different (p = 0.02) to the criterion, which would have suggested the tibia is not a suitable site to determine contact time had the study not been included. Other work reviewed here demonstrated valid results for contact time using IMUs secured to the distal tibia [62]. Although this study was not eligible for inclusion in the meta-analysis due to insufficient reporting of data, it is possible it may have supported our findings in the final meta-analysis, where no significant difference (p = 0.18) was observed between the tibia and criterion. Furthermore, IC and TO have been detected with good accuracy from tibial acceleration data [31, 62, 63], which suggests this site is a viable option for calculating temporal variables, such as contact time.

Subgroup analysis was not possible for flight time and step length due to a limited number of studies meeting eligibility criteria for inclusion. However, studies that used foot-worn IMUs to determine these metrics were meta-analysed and demonstrated that estimates of flight time and step length were not significantly different from criterion measures, which is similar to the results reported for contact time and step frequency. Collectively, the results from the four meta-analyses highlight the utility of using IMUs for gait analysis, where the findings reported here may open opportunities for practitioners to use placement on the foot, tibia or lumbar spine to capture spatiotemporal features of an athlete’s stride in the field. However, there has been little work done (two reviewed studies) applying gait event detection methods to inertial data from the thoracic spine to investigate the validity of this site to derive temporal variables, with one study only reporting a single observation (n = 1) [37, 54]. It is therefore unclear whether placement on the thoracic spine is also suitable to derive temporal stride data.

Peak vertical or resultant GRFs during running have traditionally been measured from force platforms [92,93,94]. However, IMUs are more accessible to an athlete’s normal training and competition environment than force platforms and may provide a useful tool for quantifying surrogate measures of force during running-based sports [95]. A variety of different approaches were used to estimate peak GRFs in the studies reviewed here. Although meta-analysis was not possible, predictions of vertical GRF were shown to be most accurate when studies applied machine learning techniques or used multiple IMUs at different body segments [55, 56, 61, 87]. Given IMUs are commonly worn on the thoracic spine in sport, other studies investigated the validity of this site to predict GRFs from accelerometer data, with contrasting results. Acceleration data from the thoracic spine were inadequate to predict peak vertical and resultant GRF based on Newton’s second law of motion (i.e. multiplying by body mass) [36, 43] and as input into a mass-spring-damper model [39]. Conversely, improved results were shown when peak vertical GRF was estimated from known contact time, flight time and body mass using a spring-mass model [38], while another study suggested accurate predictions of resultant GRFs from IMUs worn on the thoracic spine are possible by applying machine learning [61]. Based on the conflicting results from the studies reviewed here, it is unclear whether accurate determination of peak vertical and resultant GRFs from accelerometer data at the thoracic spine is possible and this warrants further investigation.

Two studies used estimations of peak vertical GRF to calculate vertical stiffness from IMUs worn on the thoracic spine [37, 38]. Although the small biases and large to nearly perfect correlations in both studies appear promising for determining vertical stiffness using accelerometer data from this site, it is unclear whether placement on the thoracic spine is feasible for determining vertical stiffness when one study collected data from only one participant. Furthermore, calculating vertical stiffness using a spring-mass model approach, as per the method used in the two studies, is dependent on known contact time and flight time [96]. However, neither study provided a description of how IC and TO were determined mathematically from accelerometer data, nor how these events translated to accurate derivations of temporal variables [37, 38]. The ability of IMUs attached on the thoracic spine to correctly identify IC and TO events compared to a criterion should be explored more fully before practitioners can confidently use this site to 1) accurately calculate contact time and flight time and 2) use these metrics as inputs for estimating peak vertical GRF and vertical stiffness [23, 97].

Results from reviewed studies demonstrate that it is possible to obtain reliable derivations of contact time, flight time and step frequency from a foot or lumbar spine placement [16, 68, 71], while foot-worn IMUs can provide reproducible calculations of stride time, step length and stride length [59, 71]. Furthermore, placement on the tibia and lumbar and thoracic spine possessed excellent reliability for determining vertical GRF from accelerometer data [38, 40]. Collectively, these results indicate that IMUs possess good precision for calculating different stride variables [98]. Determining the sensitivity of IMU-derived stride variables by calculating the MDC or smallest worthwhile change (SWC) is also important so practitioners can determine whether changes in an athlete’s gait pattern are real or due to error [99,100,101]. However, only two studies reported here determined the value (i.e. signal) that may constitute a meaningful change for stride variables determined from IMUs [40, 86]. One study using tibia accelerometers calculated an MDC for peak vertical GRF that was higher than the SEM, suggesting that this metric may be sensitive to detect change when IMUs are secured to the tibia [40]. Conversely, the TE associated with thoracic spine-derived peak vertical GRF and vertical stiffness was greater than the SWC [86], which suggests this site is limited for detecting subtle changes in an athlete’s gait pattern. No study determined the MDC or SWC for spatiotemporal variables, and therefore future work may look to further our understanding of the signal-to-noise ratio of other stride metrics, such as from IMUs worn at various sites.

The use of IMUs in sport is increasingly being applied to gain additional insights (i.e. other than speed and distance) into the activity profiles of athletes. Practitioners can quantify proprietary designed metrics, such as PlayerLoad™ [10, 28, 102], estimate energy expenditure [103] and record the peak segmental acceleration values that occur during a variety of different team-sport movements [42, 104] using IMUs. There is an increasing body of evidence supporting the use of IMUs to capture characteristics of an athlete’s stride, including spatiotemporal data [54], GRFs [36, 39, 40] and vertical stiffness [37, 86]. Capturing accurate stride variables appears possible across different sites using automated gait event detection techniques and may have practical application in profiling an athlete’s stride in a variety of running-based sports. The use of IMUs may allow practitioners to perform gait analyses in the field to enhance their understanding of athlete movement strategy and monitor changes in stride variables that may occur with fatigue [28].

It is important to note that the meta-analyses in this review were impacted by a limited pool of eligible studies. It is likely that the results suffer from sparse data bias in instances where only two studies were meta-analysed due to relatively small sample sizes [105, 106]. Further research should include raw outcome data (mean ± SD values) alongside validity statistics to provide a complete summary of outcomes. Furthermore, the method adopted here treated three studies that used different IMUs or criterions as independent data sources [29, 35, 68]. It is possible that we may have observed a different finding had different IMUs or criterions not been treated independently within those studies. However, due to a limited number of studies, accounting for this dependency was not possible with the data available. Finally, data were only pooled within a velocity range of 3.3–4.3 m·s−1 due to eligibility criteria. As a result, the meta-analyses here do not explain the effect of running velocity on validity, which may be an important distinction to make as previous work has shown that increased speed may lead to greater error in estimations of stride variables derived from IMUs [34, 43].

5 Conclusion

This review and meta-analysis demonstrated that valid and reliable derivations of stride metrics are possible from IMUs mounted on the foot, tibia and lumbar spine. This suggests that location may not be the most critical factor and that validity and reliability may be more dependent on the mathematical approach for the detection of gait events. However, further work is warranted to explore the application of automated gait event detection algorithms on inertial data from the thoracic spine before practitioners can confidently use this site in the field to derive stride variables.