FormalPara Key Points

Methods regarding data collection and processing criteria when using ActiGraph GT3X are summarized, and age-specific practical considerations that will be useful for researchers and practitioners are provided.

The tabulated data generated will facilitate comparisons between studies using ActiGraph GT3X and aid in the selection of the most appropriate method to use for each specific research purpose.

1 Introduction

Health benefits of physical activity (PA) across a person’s lifespan have been widely reported [13]. The use of accelerometers to assess sedentary time (SED) and PA [47] has become an objective and feasible alternative to self-report methods such as questionnaires, which are characterized by their poor reliability and validity, especially in younger populations [810]. Accelerometers are wearable devices that measure accelerations of the body segment to which the monitor is attached. The signal is usually filtered and pre-processed by the monitor to obtain activity counts, i.e., accelerations due to body movement. The amount and intensity of daily SED and PA may be obtained by classifying activity counts accumulated in a specific time interval (epoch length) with a set of cut-points, i.e., intensity thresholds for PA intensity classification [1115]. Physical activity energy expenditure (PAEE) or sleep-related behaviors may also be estimated by applying algorithms to objectively-determined activity counts [1621]. New methods to estimate these variables from raw acceleration signals (gravity units) instead of activity counts have been developed recently [2224].

Among the commercially available brands, the ActiGraph (Pensacola, FL, USA) accelerometers are the most frequently used by researchers, accounting for >50% of published studies [25]. This review only considered the latest generation of ActiGraph devices, i.e., GT3X, GT3X+, and wGT3X-BT (hereinafter referred to as GT3X/+). The continuous change in the features of these devices makes it difficult to compare data between studies.

The first ActiGraph accelerometers available were uniaxial (i.e., they could only detect vertical axis accelerations) and consequently cut-points and algorithms were developed to assess SED, PA intensity, PAEE, and sleep-related behaviors from vertical axis accelerations [11, 17, 21]. In mid-2009, ActiGraph released the triaxial GT3X, which detected accelerations in three axes (i.e., vertical, medio-lateral and antero-posterior axes). The transition from uniaxial to triaxial devices implied new calibration processes, and the algorithms developed for the vertical axis were not applicable to vector magnitude (i.e., the square root of the sum of squared activity counts from the three axes) [7, 13, 18, 20, 2628].

Due to the extremely rapid development in this field, there is an overwhelming amount of data collection and processing criteria decisions, and there is no consensus about which approaches to use. Consequently, it is difficult for researchers and practitioners to make the right decisions about which criteria should be used in a given situation. This is important as the chosen criteria have a huge impact on the outcome. In order to address this problem, some studies have compared certain GT3X/+ outcomes estimated by different cut-points and algorithms [4, 2931] in an attempt to recommend which decisions are the most accurate; however, this information is still scarce.

It is important to note that algorithms validated in a specific age group might not be valid for other age groups due to different PA patterns, so whenever possible, data collection and processing criteria should be age-specific. Accelerometer methods can be grouped into two categories: (1) data collection protocols, which are decisions that need to be made a priori such as device placement or sampling frequency; and (2) data processing criteria, which involve decisions that can be made a posteriori such as filters, epoch length, non-wear-time definition, cut-points, and algorithms. The present review will address all of these criteria separately and specifically by age group. In this review we aimed to: (1) compile and classify existing studies assessing sedentary time, physical activity, energy expenditure, or sleep using the ActiGraph GT3X/+ by data collection and processing criteria to improve data comparability, and (2) review data collection and processing criteria when using GT3X/+ and provide age-specific practical considerations based on the validation/calibration studies identified. Both objectives were approached separately for the following age groups: preschoolers, children/adolescents, adults, and older adults. Although there is a large amount of information included in this review, we believe that it is useful for readers to have a single article that summarizes the most important accelerometer methods for each age group separately. This will allow readers to go directly to a specific criteria for the age group they are interested in (e.g., PAEE in preschoolers). In this review, we provide a section with examples of how the information presented can be used in practical terms, as well as a table with practical considerations.

2 Methods

2.1 Study Design

The present review focuses on 11 key methodological issues related to GT3X/+ data collection and processing criteria: (1) device placement, (2) sampling frequency, (3) filter, (4) epoch length, (5) non-wear-time definition, (6) what constitutes a valid day and a valid week, (7) registration period protocol, (8) SED and PA intensity classification, (9) PAEE algorithms, (10) sleep algorithms, and (11) step counting. Available information was classified into two different types of studies: (1) any cross-sectional, longitudinal, or intervention study which used the GT3X/+ device and met the inclusion criteria indicated in Sect. 2.3 (objective 1); and (2) studies focused on validation, calibration, or comparison of functions related to data collection or processing criteria (objective 2). Therefore, the practical considerations provided for each age group are based on the results from the validation/calibration studies (see Table 1). Furthermore, we provide a summary of all data extracted from the validation/calibration papers included in this review by age group in the Electronic Supplementary Material Appendix S1. Inclusion/exclusion criteria and analytical methods were specified in advance and registered in the PROSPERO (http://www.crd.york.ac.uk/PROSPERO/) international database of systematic reviews (CRD42016039991) [32]. The study is conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [33].

Table 1 Summary of practical considerations by age group

2.2 Search Strategy

We searched PubMed and Web of Science for studies using the ActiGraph GT3X/+ model and classified the studies into the following age groups: preschoolers (2–5 years), children (6–11 years), adolescents (12–18 years), adults (19–59 years), and older adults (≥60 years). We combined (using the Boolean operator “OR”) the following search terms: GT3X, GT3X+, and ActiGraph. Although we wanted to limit the search to GT3X/+, the word ActiGraph was entered in the search because we found that some studies specified the brand (i.e., ActiGraph) instead of the model (i.e., GT3X/+) in the title/abstract/keywords. Since the GT3X/+ models were launched in mid-2009, we limited the dates of the search to 1 January 2010 to the 31 December 2015 and conducted the final search on 3 January 2016. We contacted authors of those studies where the data processing and collection information was unavailable in the published article. In a final step, we extended the search to the IEEE (Institute of Electrical and Electronics Engineers) Xplore database, in case we had missed any relevant studies.

2.3 Inclusion/Exclusion Criteria

We included all original studies (cross-sectional, longitudinal, or intervention studies) in which the GT3X/+ was used in a laboratory, or under controlled or free-living conditions. Protocol studies, reviews, editorials, and abstract or congress communications were excluded, as well as studies conducted in people with mobility problems or in periods of life in which their mobility could have been markedly altered (e.g., pregnancy).

Two authors working independently (JHM and CCS) read the articles and checked whether they met the inclusion/exclusion criteria. They obtained 76% agreement on the papers selected for the review before consensus and 100% agreement after discrepancies were resolved in a consensus meeting. Risk of bias assessment was also conducted independently by JHM and CCS in order to assess the quality of studies (see Electronic Supplementary Material Appendix S2).

3 Results

A total of 940 articles were identified (Fig. 1), of which 444 were excluded after reading the title and abstract and 261 articles were additionally excluded after reading the full text and did not meet the inclusion/exclusion criteria stated above. Finally, a total of 235 studies were considered eligible for the current systematic review. Of them, 78 were validation/calibration studies. Methods and results of these validation/calibration studies are summarized in the Electronic Supplementary Material Appendix S1. Detailed information about the methods and results for the rest of studies (i.e., those using GT3X/+ that were not validation/calibration studies) included in this review is available upon request.

Fig. 1
figure 1

Flowchart of the literature search and study selection process. 1Studies using accelerometers for other purposes (e.g., accelerometers attached to dogs). 2Studies that included two age ranges were counted in both age groups. 3Studies focused on validation, calibration or comparison of functions related to data collection or processing criteria. 4All cross-sectional, longitudinal, or intervention studies, which used the GT3X/+ device and met the inclusion criteria

Forty-four percent (N = 103) of the included studies were conducted in adults (46% validation/calibration studies), 34% (N = 81) in youth (30% validation/calibration studies), 22% (N = 51) in older adults (11% validation/calibration studies), and 10% (N = 24) in pre-schoolers (13% validation/calibration studies).

Studies including two or more age groups are summarized in both age group sections in this review. Table 2 presents the criteria used for data collection and processing by age group. A list of references for each of the criteria is found in Electronic Supplementary Material Appendix S3. The information provided in Table 2 and Electronic Supplementary Material Appendix S3 allows researchers to make comparisons between studies that have used the same data collection and processing criteria.

Table 2 Summary for the criteria used for data collection protocols and data processing from articles reviewed by age group (see Electronic Supplementary Material Appendix S1 for the criteria used by each of the studies listed in this table)

Figure 2 shows the percentage of studies that did not report key methodological issues by age group. Of the studies reviewed, 15–20% did not report criteria such as sampling frequency, epoch length, and a non-wear-time definition, and 60–80% of studies did not report information on the filter used.

Fig. 2
figure 2

Percentage of the 235 included papers that did not report key methodological issues, separated by age group. NWT non-wear-time

Table 3 presents the studies that compared the differences in several outcomes when the GT3X/+ device was simultaneously worn on the wrist and hip. The optimal place to attach the GT3X/+ should be chosen based on reliability, validity, and compliance. Table 4 shows the references for the studies sorted by age group and placement site that have developed SED and PA cut-points, PAEE prediction equations, and sleep algorithms. Table 5 shows the intensity cut-points used in the included studies together with the pre-processing criteria used in the study which developed each set of cut-points. Therefore, the practical considerations provided for each age group are based on the results from the validation/calibration studies (see Table 1).

Table 3 Summary of studies comparing hip- and wrist-worn GT3X/+ accelerometers
Table 4 Studies developing cut-points for sedentary time and physical activity intensity classification, PAEE, and sleep algorithms used in the articles reviewed and ordered by age group
Table 5 Cut-points for SED, LPA, MPA, VPA, and VVPA activity used in the articles reviewed and ordered by age-group in which they were validated

In the following sub-sections, we will focus only on information from validation/calibration studies presented in Electronic Supplementary Material Appendix S1. Sections 3.1, 3.2, and 3.3 correspond to data collection protocols (i.e., pre-processing stage) and Sects. 3.43.10 correspond to processing criteria (i.e., processing stage).

3.1 Device Placement

3.1.1 Preschoolers

In young preschoolers Johansson et al. [34] reported receiver operating characteristic area under the curve (ROC-AUC) data for intensity thresholds between 0.88 to 0.98 using a left wrist-mounted GT3X+. Similarly, a ROC-AUC of 0.90–0.94 was reported by Costa et al. [35] using a hip placement, suggesting high potential for both placements to correctly classify PA intensity in preschoolers.

3.1.2 Children and Adolescents

A higher compliance for wrist-worn versus hip-worn devices has been reported in children/adolescents [23]. However, similar wear-time was achieved in protocols using 24-h waist-worn compared to 24-h wrist-worn accelerometers [36].

With regard to cut-points to classify SED and PA intensity, non-dominant wrist placement achieved a lower ROC-AUC (0.64–0.89) [15] compared to the dominant wrist (0.83–0.94) [37] and hip (>0.90) for all cut-points [13, 38]. Furthermore, Hildebrand et al. [22] found a greater percentage of the explained variance when using algorithms from the hip compared to the wrist (78% for hip; 71% for wrist).

Previously developed sleep algorithms for the wrist placement were tested on the hip and wrist by Hjorth et al. [39]. They obtained a classification accuracy between 86.6 and 89.9% for the algorithms tested (developed with GT1M) [16, 17] in hip compared to wrist measurement. Hip placement overestimated total sleep time compared to the wrist (60.1 vs. 73.8 min per day for wrist and hip, respectively). Finally, Tudor-Locke et al. [18] developed an algorithm to identify bedtime for the hip-worn GT3X/+, and Barreira et al. [19] refined and validated these in a free-living environment against self-reported participant logs, where they obtained a non-significant absolute difference of 9 ± 36 min.

3.1.3 Adults

Minimal differences between contralateral hips were found for vector magnitude activity counts (effect size: 0.016, p = 0.619) and wear-time (effect size: 0.040, p = 0.213) [40]. The reliability of the GT3X/+ attached to the hip, wrist, and ankle was studied by Ozemek et al. [41], who found high correlations from 0.824 to 0.998 in vector magnitude between pairs of devices under simulated activities of daily living.

Staudenmayer et al. [24] demonstrated greater accuracy for physical activity classification when the device was placed on the wrist compared to previously developed cut-points with the accelerometer placed on the hip. Additionally, they found that newly developed algorithms could also categorize behaviors in a laboratory setting (e.g., sitting, standing, riding in a vehicle, walking, and running) better for the wrist compared to the hip placement. Ellis et al. [28] achieved better performance with a wrist model to predict household simulated activities; however, the hip model outperformed the wrist model on locomotion prediction (i.e., slow walk, brisk walk, and jogging) as well as PAEE estimation. In contrast, Ellis et al. [42] and Hildebrand et al. [22] obtained a higher accuracy (5% more on average) and a larger explained variance (81% for hip vs. 75% for wrist), respectively, for the hip compared to the wrist placement to classify PA type and intensity. Stec et al. [43] found a significant correlation between vector magnitude activity counts and net energy expenditure from hip but not from wrist worn accelerometers during resistance exercise (Pearson correlations for hip = 0.50, p = 0.005; and wrist = −0.25, p = 0.18).

With regard to step counting, Tudor-Locke et al. [44] found higher accuracy for step counting from hip compared to wrist devices in controlled conditions against direct observation. No data about placement comparisons were found in adults for sleep-related behavior estimations.

3.1.4 Older Adults

The hip has been the most commonly used placement for studies in older adults. Only one study by Choi et al. [45] placed the GT3X on the dominant wrist to validate their non-wear-time algorithm. However, we found no data comparing different device placement in this age group for any of the accelerometer outcomes.

3.2 Sampling Frequency

Due to an insufficient number of studies this section data from all age groups are combined. GT3X records accelerations at a sampling frequency of 30 Hz However, with the release of GT3X+, the manufacturer allowed users to select the sampling frequency between 30 and 100 Hz Brønd and Arvidsson [46] demonstrated that sampling frequency had an effect on activity counts (i.e., a difference of +90 counts/min (cpm) for a slow walk, +180 cpm for a fast walk, +103 cpm for a slow run, and +1601 cpm for a fast run at a sampling frequency of 40 Hz compared to 30 Hz). Since the filtering process was developed for 30 Hz, sampling frequencies in multiples of 30 produce the most accurate estimates. Particularly, these authors [46] observed that 30, 60, or 90 Hz produced similar activity counts whereas sampling frequencies at 40, 50, 70, 80, or 100 Hz offset the filter resulting in an increased number of activity counts.

3.3 Valid Day and Valid Week

We cannot present the information in this section separately for each age group due to the lack of studies. As Toftager et al. [47] reported, increasing the requirements for what is considered a valid day (i.e., the number of hours per day) and a valid week (i.e., the number of valid days with valid data) led to a decrease in sample size and therefore the study’s power.

In the National Health And Nutrition Examination Survey (NHANES) 2003–2006, where participants wore the accelerometers during waking hours, only 40–70% of them achieved a minimum of 10 h/day of wear-time for ≥6 days, while in the NHANES 2011–2012, where participants were asked to wear the accelerometers 24 h/day, the wear-time achieved was 21–22 h/day for ≥6 days [48].

Aadland et al. [49] examined how many days were needed to obtain an intraclass correlation coefficient (ICC) of 0.80 with different hours per day wear-time criteria (≥8 h/day, ≥10 h/day, and ≥12 h/day). ICCs for a single day did not differ much for all variables when the wear-time criteria increased (i.e., ICCs = 0.20–0.53 for ≥8 h/day, ICCs = 0.21–0.53 for ≥10 h/day, ICCs = 0.23–0.52 for ≥12 h/day). The number of days needed for an ICC of 0.80 decreased with a more demanding wear-time criterion [from 8.3 to 6.4 days for SED, from 4.4 to 3.7 days for light PA, and from 8.5 to 7.0 days for moderate-to-vigorous PA (MVPA), all adjusted for wear-time]. Although the registration period is usually 1 week, 2 weeks were analyzed in the aforementioned study. Also, Donaldson et al. [50] reported that 4 days of measurement would be comparable to 1 week for estimating SED (r 2 = 0.91).

3.4 Filter

3.4.1 Preschoolers

No data about the influence of the filter selected (i.e., normal vs. LFE) were found in preschoolers.

3.4.2 Children and Adolescents

Hjorth et al. [39] used normal and LFE filters on GT3X+ data from hip-mounted accelerometers during the night. Total activity count was (on average) 2815 counts per night period higher with the LFE filter compared to the normal one. Assuming 8 h of sleep, this means approximately 6 cpm more when the LFE filter is enabled. Therefore, total sleep time was 9 min per night higher with the normal filter compared to the LFE filter when using a hip mounted GT3X+.

3.4.3 Adults

Lyden et al. [51] found the normal filter more accurate compared to the LFE filter when it was used to identify SED and breaks in SED with the GT3X attached to the hip against direct observation. Ried-Larsen et al. [52] and Cain et al. [53] observed less SED and more minutes in each PA intensity with the LFE filter enabled. Non-wear-time estimation was similar between filters in the study published by Cain et al. [53]. For sleep-related behaviors, Cellini et al. [54] found similar results for total sleep time and sleep efficiency with both filters in a short sleeping time of 2 h. For step counting under free-living conditions, the use of LFE filter increased the step count by an average of approximately 6000 steps per day [44].

3.4.4 Older Adults

Wanner et al. [55] observed a mean difference of +37.8 ± 19.5 cpm when enabling the LFE filter compared to the normal filter. Therefore, less time in SED and more time in all PA intensities were observed with the LFE filter due to the influence on activity counts. The normal filter appears to be more accurate than the LFE filter when compared with the NL-100 pedometer (Lee’s Summit, MO, USA) [56].

3.5 Epoch Length

3.5.1 Preschoolers

We did not find any information about the influence of epoch length on accelerometer output in preschoolers. However, several studies used a 5-s epoch based on the belief that the activity pattern of very young children is intermittent and therefore shorter epoch lengths might be suitable to capture very short bouts of movement [34, 35, 57, 58].

3.5.2 Children and Adolescents

Aibar et al. [59] compared the effect of different epoch lengths (3–60 s) on PA intensity, and found a progressive decrease in the time spent in MVPA intensity as the epoch length increased. Furthermore, they found that smaller epoch lengths increased the resolution of the measure, thus increasing the time spent in vigorous PA intensity [59]. Therefore, they suggested using shorter epoch lengths (e.g., 3–15 s) in children.

3.5.3 Adults

No information about the influence of epoch length was found for adults.

3.5.4 Older Adults

No information about the influence of epoch length was found for older adults.

3.6 Non-Wear-Time Definition

3.6.1 Preschoolers

No information about non-wear-time definition was found for preschoolers.

3.6.2 Children and Adolescents

Toftager et al. [47] showed that the longer the non-wear-time duration the greater the number of participants that were included in the analyses. Furthermore, as the non-wear-time duration increased the average cpm decreased (e.g., average PA level: 641 cpm with strings of 10 min of consecutive zeros compared to 570 cpm with strings of 90 min of consecutive zeros) [47]. Since Toftager et al. [47] compared different non-wear-time definitions without a criterion method, it is not possible to conclude which algorithm was more valid.

3.6.3 Adults

Peeters et al. [60] compared six different definitions of non-wear-time (i.e., 20, 60, and 90 min with and without allowance of 2 min of small accelerations). It was observed that 20 min of 0 cpm without allowing for interruptions resulted in a lower misclassification (5.9%) and a similar ROC-AUC (0.94) than 60 min (6.7%, ROC-AUC = 0.94) and 90 min (7.4%, ROC-AUC = 0.93) [60]. However, in these conditions, more participants did not meet the non-wear-time criteria (32 out of 34 participants, i.e., 6% sample loss) compared to 60- or 90-min algorithms (33 and 34 out of 34 participants, i.e., 3 and 0% sample loss, respectively).

3.6.4 Older Adults

Keadle et al. [30] compared the Troiano et al. algorithm [61], which uses a minimum of 60 min of 0 cpm with an allowance of 2 min of interruptions, with the Choi et al. algorithm [62], which uses a minimum of 90 min of 0 cpm with the same allowance as the Troiano algorithm plus two 30-min windows of 0 cpm before and after that allowance. They concluded that the algorithm by Choi et al. [62] was the best to identify wear-time compared with diary records of the participants. The same conclusion was obtained in a later study by Choi et al. [45], especially when this algorithm was implemented for wrist-worn accelerometers, because the wrist placement is more sensitive to detect non-wear-time than the hip [45].

3.7 Registration Period Protocol: Waking Hours Versus 24 Hours

Due to an insufficient number of studies this section combines all age groups. Recent large-scale studies such as NHANES (2011–2012) and the International Study of Childhood Obesity, Lifestyle and Environment (ISCOLE 2012–2013) [63] have used a 24-h protocol. Tudor-Locke et al. [36] found higher wear-time compliance with 24-h protocols compared to waking-hour protocols, with this finding being consistent across different countries.

3.8 Sedentary Time (SED) and Physical Activity (PA) Intensity Classification

3.8.1 Preschoolers

Two studies comparing several cut-points developed from the vertical axis accelerations from hip-mounted devices were found [58, 64]. Janssen et al. [64] supported the use of the Evenson et al. [12] SED cut-point due to the higher classification accuracy compared to other cut-points [6569], and recommended that the Pate et al. [67] cut-points are the best option for MVPA (all of them were developed with former models of ActiGraph). However, Kahan et al. [58] observed, in a small sample size (n = 12), that Sirard et al. [65] cut-points showed the best agreement with direct observation for SED and MVPA compared to other cut-points [12, 67, 69] developed with former models of ActiGraph.

Four studies developed cut-points for SED and PA intensity for vector magnitude counts from accelerometers worn on the hip [35, 57, 70, 71]. Butte et al. [71] developed their cut-points using a 60-s epoch considering energy expenditure cut-points established through smoothing splines and ROC curves. The cut-points developed by Costa et al. [35] used 5- and 15-s epochs and were validated against direct observation. Jimmy et al. [57] developed and validated their cut-points utilizing a 5-s epoch based on indirect calorimetry. Pulakka et al. [70] developed one cut-point to differentiate sedentary/light activities from MVPA (they did not differentiate moderate from vigorous PA, as done in the aforementioned cut-points) and validated it against direct observation. All of these studies obtained high ROC-AUC (0.89–0.98 for all cut-points).

Finally, Johansson et al. [34] was the only study that developed and validated cut-points for vector magnitude from a wrist-worn accelerometer against direct observation in young preschoolers (15–36 months) obtaining a ROC-AUC of 0.89–0.98.

3.8.2 Children and Adolescents

Zhu et al. [72] compared a set of cut-points for estimating SED and PA intensity developed using the vertical axis, with the accelerometer worn on the hip in a sample of Chinese children. The authors observed a better accuracy with the cut-points proposed by Evenson et al. [12], Vanhelst et al. [73], and those internally developed [72] than with the rest of the cut-points tested [68, 74, 75] (all these cut-points were developed with former models of ActiGraph).

Five studies developed cut-points for vector magnitude counts from the hip [13, 38, 57, 76]. Peterson et al. [76] suggested that 150 cpm from hip mounted accelerations is the most accurate SED cut-point compared with direct observation. Hänggi et al. [13] developed their cut-points using a 1-s epoch in comparison with indirect calorimetry and obtained a ROC-AUC of 0.96 for SED, light PA and moderate physical activity. These cut-points [13] obtained better correlations with other brands of accelerometers than other vertical axis based cut-points [77]. Jimmy et al. [57] developed cut-points utilizing a 5-s epoch against indirect calorimetry and attained a ROC-AUC ranging from 0.89 to 0.94 for all intensities. Romanzini et al. [38] validated cut-points using a 15-s epoch against indirect calorimetry and obtained a ROC-AUC of 0.93–0.99. Finally, Santos-Lozano et al. [27] validated cut-points utilizing a 60-s epoch against indirect calorimetry and found the lowest ROC-AUC (0.6–0.8).

Vector magnitude cut-points from the wrist placement were developed in three studies [15, 22, 37]. Chandler et al. [15] validated cut-points for the non-dominant wrist using a 5-s epoch against direct observation and attained a ROC-AUC ranging between 0.64 and 0.89. A higher ROC-AUC was obtained by Crouter et al. [37] using cut-points for the dominant wrist which were developed in a 5-s epoch against indirect calorimetry (ROC-AUC of 0.83–0.94). It is important to highlight that Crouter et al. [37] applied linear regression models to the dominant wrist and obtained non-significant differences between the accelerometer outputs and indirect calorimetry (mean biases ranged from 2.2 to 8.4% for all cut-points).

Finally, we found two studies using metrics extracted directly from raw data instead of activity counts by ActiGraph. Aittasalo et al. [78] developed a method based on amplitude of accelerations from the hip’s raw accelerations. These cut-points were validated against heart-rate monitoring using an ordinal logistic regression and showed a correlation coefficient of 0.97. However, these results must be interpreted carefully since only walking and running at different intensities were used during the development of the intensity cut points. Hildebrand et al. [22] used a linear regression analysis to establish the relation between an accelerometer metric based on raw data (i.e., Euclidean Norm Minus One) and energy expenditure measured through indirect calorimetry. Then, from the developed regression equations, they defined two sets of cut-points for the hip and the non-dominant wrist. They obtained correctly classified values between 96 and 97% for SED and light PA, 33 and 55% for moderate PA, and 68 and 80% for vigorous PA.

3.8.3 Adults

Kozey-Keadle et al. [79] tested some vertical axis-based cut-points and determined that 150 cpm using the vertical axis from hip accelerations was the most accurate SED cut-point compared with direct observation. Santos-Lozano et al. [27] validated cut-points for PA intensity against indirect calorimetry and obtained a ROC-AUC between 0.6 and 0.8. Sasaki et al. [7] used a linear regression model to establish the relation between ActiGraph vector magnitude counts from the hip and energy expenditure measured by indirect calorimetry. The mean differences between the metabolic equivalents (METs) predicted by the cut-points derived from the regression model and the actual METs were −0.3, −0.4, and 0.7 at moderate, vigorous, and very vigorous intensities, respectively.

Three studies developed cut-points from raw data metrics. Vähä-Ypyä et al. [80] developed an amplitude-domain method for raw hip accelerations. The cut-points were validated against heart-rate monitoring and showed an excellent agreement (ROC-AUC = 0.99 for all cut-points); however, they were not used during free-living conditions. Hildebrand et al. [22] validated regression models for the hip and the non-dominant wrist against indirect calorimetry and defined cut-points from the regression equations generated. They obtained correctly classified values between 93 and 96% for SED and light PA, 54 and 59% for moderate PA, and 89 and 92% for vigorous PA. Finally, Staudenmayer et al. [24] developed a classifier for PA intensity based on decision trees for the dominant wrist and they obtained 75% of values correctly classified using indirect calorimetry. Within this context, they reported preliminary results that their model performs well in a free-living environment [24].

3.8.4 Older Adults

Keadle et al. [30] observed that cut-points using the vertical axis or vector magnitude are not comparable. Unfortunately, they could not report which cut-points were the most accurate since they did not determine a criterion to compare the outcomes [30]. Aguilar-Farias et al. [26] validated SED cut-points utilizing vector magnitude counts acquired from the hip with 1-, 15-, and 60-s epochs against ActivPAL3™ (Pal Technologies Ltd., Glasgow, UK) and found a high classification accuracy (ROC-AUC of 0.82, 0.85, and 0.86 for 1-, 15-, and 60-s epochs, respectively). Santos-Lozano et al. [27] validated moderate, vigorous and very vigorous PA cut-points against indirect calorimetry and obtained a ROC-AUC of 0.7 for all intensities examined.

3.9 Physical Activity Energy Expenditure (PAEE) Algorithms

Ten studies that developed PAEE algorithms were found [7, 20, 22, 27, 28, 43, 64, 71, 81, 82]. Due to an insufficient number of studies that used doubly labelled water or room calorimetry as criteria, this section combines age groups. Only two studies validated their algorithms against doubly labelled water or room calorimetry in preschoolers. Butte et al. [71] developed cross-sectional time series and multivariate adaptive regression splines to predict PAEE using both GT3X+ and heart-rate monitoring. They validated the algorithms under controlled conditions using room calorimetry and in free-living conditions utilizing doubly labelled water [71]. The multivariate adaptive regression splines obtained a better prediction of PAEE against room calorimetry, i.e., inter-method mean difference equal to 0.006 ± 0.085 kcal/min; however, the cross-sectional time series model achieved a better prediction in free-living conditions, using doubly labelled water (mean difference 41 ± 97 kcal/day) [71]. Zakeri et al. [81] used the same two statistical methods described above with GT3X/+ and heart-rate monitoring. They obtained better prediction with the cross-sectional time series model against room calorimetry (i.e., 0.001 ± 0.070 kcal/min), but they did not validate the method in a free-living environment [81]. All these studies were carried out with hip-worn GT3X/+.

3.10 Sleep-Related Behaviors

3.10.1 Preschoolers

We did not find any study comparing different sleep algorithms in this age group. Only Meredith-Jones et al. [83] used the Sadeh et al. [16] sleep algorithm to identify sleep time in preschoolers. However, this algorithm was developed in an older sample (10–25 years), and the results should be interpreted cautiously.

3.10.2 Children and Adolescents

Hjorth et al. [39] compared the performance of existent sleep algorithms from the hip versus the non-dominant wrist placements. Despite the fact that these algorithms were developed for wrist accelerations, they obtained good accuracy (86.6–89.9%) [16, 17] at both placements; however, the hip-worn device overestimated total sleep time compared to the wrist (60.1–73.8 min/day). These findings may be affected by the fact that bedtime was reported by participants using diaries and not through an algorithm.

Tudor-Locke et al. [18] developed an algorithm to detect bedtime for the hip-worn GT3X/+, and Barreira et al. [19] refined and validated it in a free-living environment against self-reported bedtime. They obtained a non-significant absolute difference of 9 ± 36 min of bedtime per night [19].

3.10.3 Adults

Cellini et al. [54] found an accuracy of 82.8% for classifying epoch-by-epoch sleep or awake status against polysomnography (i.e., the gold standard to measure sleep patterns) using the Sadeh et al. [16] sleep algorithm. However, they found an overestimation of total sleep time (i.e., inter-method mean difference equal to 8.80 min) and sleep efficiency (i.e., inter-method mean difference equal to 14.53%), as well as an underestimation of sleep onset latency (ICC = 0.56) and awakenings after sleep onset (ICC = 0.54) during a 2-h sleep protocol. Rosenberger et al. [84] observed a mean difference of 4 min of sleep time for the Sadeh et al. [16] sleep algorithm compared to the Z-machine (portable monitor to measure brain activity which is relatively comparable to polysomnography [85]). Equally, Slater et al. [86] reported good accuracy of the Sadeh et al. [16] algorithm to detect total sleep time and moderate validity for awakenings after sleep onset against polysomnography from the wrist, but not from the hip. Finally, Zinkhan et al. [87] tested the performance of the Cole et al. [17] algorithm for the hip, even though it was developed for wrist accelerations. They observed a limited agreement with total sleep time measured by polysomnography (mean difference of 81.1 min/night).

3.10.4 Older Adults

No data about sleep algorithms were found in the papers reviewed for older adults.

3.11 Step Counting

Only data on step counting estimated by ActiLife software are available in adults. Tudor-Locke et al. [44] found a higher accuracy for step counting from the hip-mounted monitors over wrist-mounted ones under controlled conditions against direct observation. Under free-living conditions, the wrist-worn accelerometer detected more steps than the hip-worn one independently of the filter used. See Sect. 3.4 for more information about how different filters influence step counting [44].

4 Discussion

The use of objective methods when assessing SED, PA, PAEE, and sleep in various research settings has increased enormously as indicated by the large number of articles included in this review. Accelerometry has several advantages over questionnaires and self-report methods [4, 10]; however, data collection and processing criteria have a large impact on the interpretation of the data. Thus, predetermined decisions about data collection and processing in relation to the study participants and the objective of the study are important when planning research in this area.

A major finding of this review is that many of the studies did not report on key methodological issues. Therefore, data cannot be accurately compared between studies and readers may have difficulties interpreting how various methodological decisions may have influenced the main findings/conclusions. We recommend that future studies should report the complete set of criteria included in the present review in order to improve data comparability and reproducibility.

The next subsections provide practical considerations for every criterion based on the critical information extracted from the validation/calibration studies reviewed.

4.1 Device Placement

Traditionally, cut-points to assess SED and PA variables, as well as PAEE estimates, were developed with the device placed on the hip, while algorithms to assess sleep-related behaviors were more commonly developed based on wrist accelerations. Ideally, researchers may want to collect accelerometer data using a 24-h protocol with one accelerometer attached to either the hip or the wrist and be able to estimate SED, PA, and sleep-related behaviors. However, lack of validated algorithms in some age groups [15, 18, 19, 22, 24, 37] preclude this.

We decided not to differentiate between the right or left hip because no significant differences were found by Aadland et al. [40]. We consider that both the hip and the wrist are feasible places to attach the GT3X/+ . Better compliance for wrist- compared to hip-worn devices has been reported in children and adolescents [23], but similar wear-time was found in large-scale studies in adults using hip and wrist placements [36]. We therefore cannot confirm the general belief supporting better compliance for wrist-worn devices. More studies are needed to investigate compliance differences between wrist- and hip-worn devices as well as the extent to which these differences influence the validity and reliability of accelerometer outcomes.

There are only a few studies directly comparing two placement sites using the GT3X/+ and they have consistently shown more accurate classification of SED and PA intensity as well as estimates of PAEE when the accelerometer was worn on the hip compared to the wrist [13, 15, 22, 28, 37, 38, 43]. However, one study found a better performance for the wrist-worn device for PA intensity classification [24] (see Table 3). Step count also differs greatly depending on the device placement, i.e., more steps (>2500) were counted when wearing the accelerometer on the wrist compared to the hip in free-living conditions [44]. When studied under controlled conditions, hip placement has shown more accurate step counting than wrist placement from a speed of 54 m/min and upwards (at lower speeds, accuracy was better in the wrist) [44]. The lower accuracy for the wrist-worn devices could be due to the fact that accelerations such as brushing ones teeth might be interpreted as steps when the device is placed on the wrist, but not on the hip; nevertheless, this is just a hypothesis that needs to be confirmed by data under free-living conditions.

With regard to sleep algorithms, Hjorth et al. [39] compared the functioning of two algorithms applied to hip data against wrist data, finding an overestimation of the sleep time and a high accuracy (86.6 and 89.9 for each algorithm) from the hip compared to the wrist. However, it is important to note that these investigators imputed sleep and wake time manually from logs kept by the participants. The use of logs by the participants might explain the high accuracy achieved using a wrist-developed sleep algorithm on the hip.

4.2 Sampling Frequency

Our recommendation is to use the highest sampling frequency possible, as we cannot anticipate future data processing needs. However, given the issues associated with other sampling frequencies other than 30 Hz or its multiples as described in Sect. 3.2, sampling frequencies in multiples of 30 Hz seem to produce more accurate estimates when processing the signal using the methods proposed by ActiGraph. Therefore, the most reasonable conclusion for the time being would be to use 90 Hz when researchers are using the manufacturer methods, and 100 Hz when researchers are filtering and processing the signal on their own.

4.3 Valid Day and Valid Week

To ensure that data are representative of an entire day, it is necessary to establish how many hours of wear-time are required. It is also necessary to set how many valid days are needed to be representative of the total assessment period, which is usually 1 week (7 complete days). Wear-time criteria for a valid day depend on the registration period protocol, i.e., waking hours or 24 h. In studies in which the accelerometer is worn for 24-h periods to assess both physical activity and sleep-related behaviors, the number of hours required for a day to be considered valid has to be larger than studies in which the accelerometer was taken off at night.

Similarly, increasing requirements for a valid day and a valid week provides more reliable data (more information can be found in Table 5 from the study by Aadland et al. [49]); however, it results in greater sample loss. Our recommendation is to test different criteria to get the best compromise between sample size (and therefore optimal statistical power) and reliability of the measure. However, a minimum of 4 days of valid data is recommended as was suggested in a previous systematic review [88].

4.4 Filter

When movements (accelerations) occur at too low or high frequencies, ActiGraph interprets that this acceleration might not be compatible with human movement and should therefore be excluded from the analyses (e.g., if someone is using a drill). The GT3X/+ filtering process to exclude this kind of acceleration is implemented in the ActiLife software (ActiGraph, Pensacola, FL, USA). This software allows users to choose between two different filters when processing the data: normal (default) and low-frequency extension (LFE) filters.

The algorithms for these filters are proprietary information. It is known that a normal filter detects accelerations from a frequency range of 0.25–2.5 Hz, while the LFE filter establishes a lower threshold to capture slower movements; however, it is unknown exactly how much lower this threshold is. A weighting function is applied to the accelerations between the range of 0.25–2.5 Hz, so that the full weight (i.e., 1.0) is given to a frequency of acceleration of 0.75 Hz, and lower weighting is given to higher and lower movement frequencies progressively [89]. Accelerations at a frequency greater than 2.5 Hz are removed by the filter, although it is important to highlight that accelerations up to 3.4 Hz can be produced by the human body when performing physical activity at vigorous intensities when the device is attached to the hip (higher frequencies are achieved in the wrist) [90, 91]. Therefore, ActiGraph’s filtering process might remove accelerations associated with vigorous PA, and consequently, minutes in vigorous intensities might be misclassified as moderate PA [46, 92]. As the filter used has a large impact on the accelerometer outputs, it is alarming that 74% of the studies reviewed did not report this key information (Fig. 2).

When selecting a set of cut-points or an algorithm to estimate a variable from activity counts, our recommendation is to select the same filter that was used in the validation study for the cut-points or algorithm employed (Table 5 shows the filters used in all the cut-points identified). If cut-points or algorithms are not used, then researchers can decide which filter to use, we suggest using the LFE filter when low movements are of greater importance (e.g., when analyzing SED, sleep or PA in older adults). From the studies discussed above, researchers and practitioners should be aware that enabling the LFE filter compared with the normal filter will result in decreased SED, greater time in PA at all intensities and an increase in the number of steps per day.

4.5 Epoch Length

Activity counts produced by filtering raw accelerations need to be summed into specific time intervals or epoch in order to estimate PAEE, time spent in SED, in various levels of PA intensity, as well as estimating sleep/wake state, and this is usually done by applying specific intensity cut-points and algorithms.

Given that epoch length influences activity counts, it is important to use the same epoch length that was used in the validation study for the cut-points or algorithms (see Table 5). Epoch length should also be taken into account when comparing data from different studies. In young people (from preschoolers to adolescents), shorter epochs (1–15 s) are recommended to capture short bouts of activity occurring frequently in these age groups. In adults and older adults there are currently no data comparing the effect of epoch length on the outcomes studied. However, our own unpublished data suggest that selecting a 1-s versus a 60-s epoch length has a marked impact on the accelerometer outcomes, i.e., +45–60 min/day in MVPA using a 1-s compared to a 60-s epoch. This large impact on the accelerometer outputs warrants further research on this topic in order to obtain more comparable and accurate data. Considering sleep measurements there is an overall consensus for using a 60-s epoch (probably due to the stable movement pattern during sleep), as all sleep algorithms have been validated using that epoch length [1619].

4.6 Non-Wear-Time Definition

In free-living studies, accelerometers are usually removed during water based activities, e.g., swimming or showering and when sleeping (in some studies). As a result, individuals might forget to wear the accelerometer for a day(s) or part of day. Consequently, non-wear-time must be identified (e.g., by a diary or algorithms) and excluded from data before analysis. Otherwise, this time is likely categorized as SED. Generally, algorithms to detect non-wear-time consist of intervals of time with consecutive 0 cpm with or without an allowance of several minutes in which small accelerations are allowed, with optional windows of 0 cpm before and after this allowance. Toftager et al. [47] studied the effect of different non-wear-time definitions and concluded that the most accurate algorithm might differ among subgroups of children/adolescents. For example, studies focused on overweight adolescents might need to set a longer time of consecutive 0 cpm, since they have higher SED that can be misclassified as non-wear-time.

More studies are needed to examine the accuracy of different non-wear-time detection algorithms in all age groups. Based on the reviewed studies, we cannot recommend a non-wear-time definition for preschoolers, children, or adolescents. For adults, 20 min of consecutive 0 cpm without allowance showed the lowest misclassification error; however, it may result in slightly more loss of data (6% of the sample size [60]). As the accuracy between 20 and 60 min of consecutive 0 cpm was similar (i.e., the ROC-AUC was virtually identical = 0.94), we suggest using 60 min of consecutive 0 cpm without allowing for interruptions in counts in this period for adults, to avoid the risk of misclassification of non-wear-time as SED. In older adults, we recommend the Choi et al. algorithm [45], which consists of 90 min of 0 cpm with an allowance of 2 min of activity when it is placed between two 30-min windows of 0 cpm. This algorithm outperformed other algorithms on the detection of non-wear-time [45] compared with the non-wear-time reported by participants.

4.7 Registration Period Protocol: Waking Hours versus 24 Hours

In line with recent and large-scale studies [63], we suggest registration periods of 24 h instead of waking hours (more recording time, therefore more valid data). This is mainly due to an interest in assessing sleep-related behaviors and better compliance.

4.8 SED and PA Intensity Classification

Traditionally, SED and PA intensity have been estimated based on the number of activity counts accumulated in a certain period (epoch length). Cut-points are the thresholds of activity counts used to categorize activity as SED, light, moderate, vigorous or very vigorous PA. Table 5 presents the cut-point values (expressed as counts per time unit) that are currently available for SED, and for light, moderate, vigorous, and very vigorous PA by age group. It is important to keep in mind that although the GT3X/+ is a triaxial accelerometer, the data are provided separately for the three axes plus the vector magnitude, so that it is still possible to use the data registered only by the vertical axis and apply it to the previously developed algorithms for the vertical axis.

When applying cut-points to a specific data set, it is recommended to follow the same data collection and processing criteria which were used in the original validation/calibration study (see Table 5). All derived intensity thresholds are influenced by the activities chosen when performing the calibration studies. Thus, it is impossible to recommend the most appropriate set of intensity thresholds for free-living assessment. Also, different generations of ActiGraph devices have shown to be comparable under controlled conditions [7, 93], but not in a free-living environment [7, 52, 53, 94]. This suggests that if a certain cut-point was developed, for instance using the vertical axis from the GT1M, that cut-points may not be used for data collected with the GT3X/+ vertical axis, since they are not fully comparable. Therefore, our recommendations are based on cut-points developed only with GT3X/+ accelerometers. This review shows the need for future meta-analytic studies summarizing cut-points for each age group in order to obtain a set of cut-points with a wide range of activities influencing its development. Finally, across the studies reviewed, we have observed a widely accepted criterion to define PA intensity in the studies validating cut-points against indirect calorimetry, i.e., 1–1.5 MET for SED, 1.5–3 MET for light PA, 3–6 MET for moderate PA, and >6 MET for vigorous PA.

The criteria considered for cut-point recommendations are: (1) the cut-points cover the whole activity spectrum (i.e., SED, light PA, moderate PA, and vigorous PA), (2) calorimetry as an objective criterion is better than direct observation; (3) for young populations, cut-points developed in short epochs; (4) the number and type of activities included in the study that derived the cut-points; and (5) results obtained in comparison with the criterion.

Preschoolers experience a rapid anatomical development and their patterns of PA change dramatically during the first years of life; therefore, the age of the sample is very important in preschoolers. Thus, for the hip placement we recommend Costa et al. [35] cut-points for early preschoolers (2–3 years old) and Jimmy et al. [57] cut-points for older preschoolers (4–6 years old). They have been developed in short epochs which enables the devices to capture small bouts of vigorous PA, which is typical for this age group, while obtaining high accuracy in their validation. For young preschoolers (15–36 months) using the wrist placement, we recommend Johansson et al. [34] cut-points developed using a 5-s epoch because they obtained similar accuracy to hip-developed cut-points.

For children, we recommend using the Hänggi et al. [13] cut-points developed in 1-s epoch for the hip. For adolescents, the Romanzini et al. [38] cut-points developed utilizing a 15-s epoch appears appropriate. Both of these obtained excellent classification accuracy (ROC-AUC >0.90 for all cut-points) and cover almost the whole spectrum of PA intensities. For the dominant wrist, and working with counts data, we recommend Crouter et al. [37] cut-points and for the non-dominant wrist Chandler et al. [15] cut-points. If a researcher is interested in working directly with raw data, Hildebrand et al. [22] cut-points seem to be the best options since they were validated against indirect calorimetry and they obtained relatively high accuracy, except for moderate and vigorous PA (33–80%).

For adults, 150 cpm measured using the vertical axis from hip accelerations are the best option to estimate SED [79]. For PA intensity classification, we recommend Sasaki et al. [7] cut-points developed utilizing a linear regression equation. Staudenmayer et al. [24] and Hildebrand et al. [22] cut-points are the only alternative at the moment to estimate PA from the dominant and the non-dominant wrists, respectively, considering that raw data metrics have to be used to apply them, not activity counts.

For older adults, we only found the SED cut-points proposed by Aguilar-Farias et al. [26] and the PA cut-points by Santos-Lozano et al. [27]. By combining these cut-points we can assess the whole spectrum of PA levels, which is the only option at present.

4.9 PAEE Algorithms

PAEE can be estimated using algorithms applied to GT3X/+ data. Since the same movement can produce different energy expenditure depending on the characteristics of the individuals, caution is advised when interpreting PAEE estimated from accelerometry. It is worth highlighting that PAEE algorithms developed in a laboratory or a controlled setting are influenced by the activities selected in the study, while only studies under free-living conditions using doubly labelled water as a criterion to test validity can actually measure PAEE. Thus, in this review we have only focused on studies using doubly labelled water and room calorimetry as a criterion.

The criteria considered for PAEE algorithm recommendations were: (1) free-living studies are better than lab/controlled studies; (2) for young populations, algorithms developed in short epochs; (3) whether cross-validation was performed; and (4) results obtained in comparison with the criterion.

As noted in Sect. 4.8, different generations of ActiGraph devices are not fully comparable in free-living conditions [7, 52, 53, 94], thus, our recommendations are based on PAEE algorithms developed only with GT3X/+ accelerometers. PAEE has been expressed differently across studies, which needs to be considered when choosing a suitable algorithm. Overall, our conclusion is that more validation studies during free-living conditions utilizing doubly labelled water are needed in all age groups.

For preschoolers, we recommend to use the algorithm proposed by Butte et al. [71] as it is the only one validated in free-living conditions against doubly labelled water and they obtained a high accuracy. There are no algorithms for wrist accelerations in this age group. Likewise, we do not recommend algorithms for the rest of age-groups since none of them were developed using doubly labelled water or room calorimetry as a criterion.

4.10 Sleep-Related Behaviors

The ActiGraph GT3X/+ can identify sleep-related behaviors from movement/non-movement patterns by applying sleep algorithms to activity counts. The overall conclusion is that more studies developing and validating sleep algorithms for the wrist and the hip mounted ActiGraphs are needed in all age groups. However, based on the aforementioned information and the ages of the samples in the validation studies for sleep algorithms, we recommend use of the Barreira et al. [19] algorithm in children and adolescents when the accelerometer is attached to the hip to document bedtime that is not reported by the participants. Also, we recommend the Sadeh et al. [16] algorithm if it is placed on the wrist to score the sleep time and the rest of sleep-related behaviors. Moreover, we recommend the Sadeh et al. [16] algorithm for young adults (i.e., up to 30 years of age) and Cole-Kripke et al. [17] algorithm for older adults (i.e., >30 years of age) (in both cases with the accelerometer placed on the wrist).

4.11 Step Counting

We have reviewed studies using the default step counting function by ActiLife. In this regard, we recommend using the normal filter when the step count is a variable of interest, as it has been demonstrated to be more comparable to other criterion devices than the LFE filter (see Sect. 3.4).

5 Limitations and Strengths

Several limitations need to be acknowledged. Studies with earlier models than GT3X/+ (e.g., GT1M) have not been included in our review, so our recommendations are limited to the triaxial ActiGraph models (GT3X/+). Another limitation is that for certain age groups and for some accelerometer criteria analyzed, the number of studies was small; therefore, the recommendation should be revisited when more studies on those topics are available. In addition, the field of accelerometry is rapidly developing and continuously changing. Therefore, reviews are needed every few years in order to update the recommendations provided in this review. Another major issue is that proprietary algorithms used by the manufacturer when processing the data to obtain activity counts are unavailable to the public and these affect outputs. Future work using the raw acceleration signal (i.e., before any filtering is applied) should overcome this problem. Finally, another limitation is that some of our recommendations are based on few studies and should be interpreted cautiously. Thus, further studies such as a formal meta-analysis may provide the most optimal intensity thresholds for the different intensity thresholds.

The strengths of this review are: (1) the inclusion of a large number of studies, summarizing the methodologies used in each of them, which will allow for more accurate comparability of the data; (2) the separate sections for the validation/calibration studies in order to provide guidance and recommendations to researchers and practitioners; (3) the inclusion of all age groups in one single review, which will allow researchers to find/read the information about the age group they are working with/interested in; and (4) the set of tables included in this review were developed to assist researchers in their decision making process (see the examples included in Sect. 5).

6 Practical Implications

This review will help researchers and practitioners to make better decisions when designing their study and processing the data from the GT3X/+ accelerometer in order to obtain the most accurate and comparable information. Here, we provide some hypothetical examples illustrating how the information presented in the tables in this review can be used.

  • A researcher intends to evaluate accelerometry in a new study and needs to know where to place the accelerometer. Table 3 summarizes the most important results obtained when comparing the outputs from the GT3X/+ attached to the hip versus the wrist and provides recommendations depending on the variables to be analyzed by age group.

  • A researcher has collected accelerometry data and wishes to compare the data with those from other studies to generate an accurate and meaningful discussion. Table 2 lists the criteria used for data collection protocols and data processing in studies. Electronic Supplementary Material Appendix S3 lists all articles that have been used for each of these criteria.

  • A researcher has collected accelerometry data with the device placed on the wrist (for example) and wishes to know which cut-points, PAEE or sleep algorithms can be applied to those data. Tables 4 and 5 will help the researcher answer these questions.

  • A researcher has decided to apply a specific set of cut-points based on the characteristics of his/her sample but is uncertain which exact setting was used in the original study (and is aware that it is recommended that the same settings be used to ensure the cut-points are applied correctly to the new data). Table 5 lists all criteria needed to correctly apply these cut-points (i.e., placement, filter, vector and epoch).

7 Conclusion

We suggest that researchers who assess SED, PA, PAEE, sleep-related behaviors, and/or steps using GT3X/+ select the specific placement, sampling frequency, filter, epoch length, non-wear-time definition, valid days and valid week criteria, SED and PA intensity classification, PAEE, and sleep algorithms depending on the population’s age (i.e., preschoolers, children and adolescents, adults, or older adults). Likewise, when selecting a specific cut-point or algorithm, it is important to apply the same criteria as in the original validation/calibration study. Moreover, this review has identified some issues in the studies using the GT3X/+ during the last 5 years, such as that many studies do not report all of the criteria used in their analyses (see Fig. 2). Future studies are recommended to report the criteria as summarized in the present review.

Although ideally researchers should select all the data collection and processing criteria before the assessment period, it is important to note that only the placement and sampling frequency criteria have to be decided a priori (i.e., before the measurement period), while the rest of processing decisions can be made a posteriori. This is important since new and better analytical methods might emerge after a study was planned, and they should be considered and tested, at least as sensitivity analyses. The preliminary evidence comparing wrist and hip placements seems to support the idea that a similar compliance can be achieved wearing the accelerometer on the wrist or on the hip, while wearing it on the hip might produce more accurate estimates of PAEE and better SED and PA intensity classifications; however, these notions need to be confirmed or refuted in future studies. We recommend recording raw data for complete days (i.e., 24-h periods), so that collected data will have the maximum potential for future analyses. The summary tables presented in this systematic review will help researchers to make better decisions on how to design and process the GT3X/+ data.