Introduction

Physical activity (PA) is a well-established and modifiable lifestyle determinant for multiple cardio-metabolic outcomes, including obesity, type 2 diabetes (T2D), and cardiovascular diseases (CVD) [1]. The vital role of physical activity in regulating cardio-metabolic health is well-documented [2].

Globalization of unhealthy lifestyles, particularly increasingly sedentary behavior, has undoubtedly contributed to the global pandemic of cardio-metabolic disorders. It is estimated that approximately 3.2 million deaths worldwide each year are attributable to insufficient physical activity. In the United State (US) alone, about 50.8% of all adults do not meet the Physical Activity Guidelines for aerobic activity, and only 20.8% meet the guidelines for both aerobic and muscle-strengthening activity in 2015 [3]. Within this context, approximately 35% of coronary heart disease (CHD) mortality that has been attributed to the lack of physical activity [4], leading to the estimated $117 billion health care expenditures annually in the US [5].

Interventions to enhance PA have met with modest results, at least partially attributable to physiological and metabolic constraints, facilities, built-in environment, economics, and policies. These challenges underscore the need for well-designed preventive measures, informed by personalized precision medicine to understand the etiology of physical inactivity as it relates to cardio-metabolic diseases. While many psychological, biological, social, and environmental correlates have been identified, current understanding of the genetic architecture contributing to PA is very limited, especially when compared to other phenotypes such as obesity and diabetes. Moreover, relative to populations of European ancestry, other ethnic groups are disproportionately affected by physical inactivity and yet underrepresented in almost all genetic studies.

Driven by the promises of genetic epidemiology to unravel the complex genetic basis of PA, various studies have attempted not only to identify the presence of a genetic component but also to target specific candidate loci. More recent studies on the genome-wide scale have provided additional insights into the genetic architecture underlying human PA. In spite of those efforts, three key questions about PA genetics are of particular interest: (1) whether there is a genetic component underlying human PA, (2) to what extent PA levels are affected by the genetic factors, and (3) exactly, what genetic factors and biological pathways are involved in determining PA levels. With the emerging techniques of genotyping, imputation, and next-generation sequencing, along with the increasing resources allocated to genetic studies, the ability to investigate PA genetics has substantially improved, and there is an urgent need for timely information from a comprehensive and in-depth review of the research on PA genetics.

Evidence from Animals

The role of genetic factors in regulating PA levels has long been postulated based on findings from animal models [6,7,8].

Knab et al. found that the expression of the dopamine 1 receptor gene (Drd1) was suppressed in the brains of the highly active inbred mice (C57L/J), compared to the less active inbred mice (C3H/ HeJ) [9]. The same research team conducted a follow-up study and noted that the administration of a specific D1-receptor agonist (SKF-81297) caused substantial reduction in PA levels of highly active mice, which confirmed their previous findings and suggested that regulating the D1 receptor pathway has direct impact on PA levels [9].

Nescient helix-loop-helix-2 (Nhlh2) is a gene shown to transcriptionally regulate prohormone convertases in addition to numerous other neuropeptide genes [10]. It exhibits partial haplotype differences between mice models with varying PA levels and localizes with a previously known PA-related quantitative trait locus (QTL) on chromosome 3 (46 cM) [11]. These findings have been corroborated by DJ Good et al. who found that Nhlh2 knockout mice (N2KO), a line of obese mouse model, have substantially reduced PA levels with no overt differences in food intake [12]. Subsequently, a mechanistic model was proposed to illustrate how the acetylation status of NHLH2 influences the expression of the monoamine oxidase A (MAO-A) gene, and then the levels of MAO-A enzyme, which regulate PA levels through the dopaminergic pathway [12].

The glucose transporter-4 gene (GLUT-4), also known as the solute carrier family 2 facilitated glucose transporter, member 4 gene (SLC2A4), is critical in the regulation of glucose uptake in skeletal muscle and the maintenance of glucose homeostasis. The elevated expression levels of GLUT-4 in the skeletal muscles of mice, due to the genetic selection history, is associated with glycogen depletion, which may limit the amount of carbohydrates locally available to sustain muscle contraction [13]. Transgenic mouse model overexpressing levels of GLUT-4 selectively in fast-twitch skeletal muscles had substantially higher levels of PA compared to control mice [14]. Another study comparing selectively bred highly active mice vs. to control mice reported about 2.5 times higher levels of the glucose transporter-4, along with higher glycogen and glycogen-synthase activity, among the highly active mice [13]. These findings were further supported by the fact that Slc2a4 is actually located in two previously established PA-related QTL on chromosome 11 (40 cM) and a mini-muscle phenotype-related QTL, which were identified from linkage studies in mice [15•].

Animal models can be viewed as a practical alternative resource when investigating PA. As noted above, animal research strongly suggests the presence of a heritable component in the determination of one’s degree of PA. Nevertheless, we should exercise caution when generalizing findings from mice to humans. Undoubtedly, evidence from animal models requires further scrutiny and validation through research in humans.

Evidence from Parent-Offspring Studies, Twin Studies, and Linkage Studies

Both parent-offspring studies and twin studies have been widely used to decompose familial resemblance in PA into environmental influences and genetic influences. The estimates of heritability reported by family studies have been in the range of 0.16 ~ 0.29 [16,17,18,19], although a few studies with objectively measured PA levels have reported higher levels of heritability [20, 21], suggesting that heritability may be underestimated when PA levels are self-reported. Twin studies have also been extensively used to decompose familial resemblance and to estimate the heritability of PA. Twin studies have the advantage of better distinguishing the impact of shared vs. unique environmental influences and have produced generally higher heritability estimates of PA compared to parent-offspring studies ranging from 35 to 83% [22]. Although family-based studies of shared phenotypes are invaluable in estimating the genetic component underlying human PA, they cannot by themselves provide insight on the underlying genetic mechanisms at play. In addition, environmental factors are harder to control in humans compared to animal models and sample sizes in many of the early family-based studies were small.

Early linkage studies extended the family-based study design. By demonstrating cosegregation of a trait with microsatellite markers spread out evenly across the genome, linkage studies map variability of a trait to a genomic location [23, 24]. An early genome-wide linkage scan based on the Quebec Family study showed promising evidence suggested linkage between chromosome 2p22-p16 and PA [25] (Table 1). Two other linkage studies, based on data from the VIVA LA FAMILIA Study and the Netherlands Twin Registry, also found suggestive evidence at 18q and 19p13.3 [26, 27] (Table 1). While these linkage studies have shed some light on the location of causal genetic variants for PA, they are hampered by poor resolution of the initial genetic signal and have limited power to detect modest effects [23,24,25,26,27]. Given most traits are complex, featuring hundreds or thousands of variants contributing modest effects, it is perhaps not surprising that linkage studies have not been that helpful for PA.

Table 1 Evidence from linkage studies for human physical activity

Evidence for Candidate Genes

More recent studies have employed association-based methods including candidate gene analyses and genome-wide association studies (GWAS). Several candidate genes have been proposed based on functional relevance and evidence from animal experiments, parent-offspring studies, and twin studies (Table 2).

Table 2 Candidate genes for human physical activity with functional relevancy and/or association evidence

LEPR, which has previously been related to obesity, was found associated with PA phenotypes in Pima Indians [28] and prepubertal boys [29]. Among Pima Indians, carriers of the Arg223-encoding allele of the Gln223Arg polymorphism (rs1137101) were found to have an approximately ~ 200 kcal lower 24-h energy expenditure (EE) per day compared to non-carriers (P = 0.01) [28]. The same polymorphism in LEPR was found associated with PA levels measured by PA-related energy expenditure in a group of growing prepubertal boys of European ancestry (GlnGln 240.5 ± 11.8 kcal/day vs. GlnArg 266.6 ± 14.3 kcal/day vs. ArgArg 180.6 ± 21.0 kcal/day; P = 0.02) [29]. In a subsequent replication study, De Moor et al. found that the P value for the rs12405556 SNP in the LEPR gene is 9.70 × 10−4 [30••].

The calcium-sensing receptor (CASR) gene is involved in calcium homeostasis and regulates bone mass density and serum calcium concentrations [31]. Lorentzon et al., observed ~1.4 h per week lower PA levels among adolescent girls carrying the serine genotype at the A986S polymorphism (rs1801725)(P = 0.01) [32].

The 3′-phosphoadenosine 5′-phosphosulfate synthase (PAPSS2) gene is also a promising PA-determining gene. This gene is widely expressed in muscles and appears to be involved in initial skeletal development, which may influence the capacity to perform physical activity in later life [33, 34]. The functional relevance of PAPSS2 is also corroborated by evidence from mouse models showing that the homologous mouse gene exhibits partial differences in haplotype structure between a highly active inbred strain of mice (C57L/J) and the control mice (C3H-HeJ). Lastly, the chromosome region (10q23) harboring this gene has been linked to maximal exercise capacity in a previous genome-wide linkage study [35], and the only GWAS that has been conducted to date using logistic regression on PA status (PA-related energy expenditure ≤ 4 MET-h/week vs. > 4 MET-h/week) also supports a role of this gene in determining PA levels (rs10887741; effect allele, T; OR, 1.32; 95% CI, 1.17–1.49; pooled P = 3.81 × 10−6; pooled P adjusted for BMI = 6.26 × 10−6) [30••].

Another gene involved in the dopaminergic pathway, the dopamine 2 receptor (DRD2) gene was also found associated with multiple PA phenotypes [36]. This candidate gene study showed that a fragment length polymorphism in exon 6 (rs6275) of DRD2 was associated with time spent in PA (C/C 1.24 ± 0.03 vs. C/T 1.33 ± 0.06 vs. T/T 0.99 ± 0.12; P = 0.02) among women from two family-based cohorts. Ethnicity-specific analysis suggested that this polymorphism was also associated with levels of sports (C/C 1.65 ± 0.06 vs. C/T 1.77 ± 0.10 vs. T/T 1.28 ± 0.15; P = 0.02) and occupational PA (C/C 2.28 ± 0.13 vs. C/T 2.38 ± 0.18 vs. T/T 1.79 ± 0.21; P = 0.004) among white women. The investigators speculated that the DRD2-risk allele carriers tend to have lower PA levels because of impaired or subtly reduced motor skills induced by dopamine receptor deficiencies or alterations, and thus are more prone to physical inactivity.

Evidence from a series of candidate gene studies suggests that the angiotensin-converting enzyme (ACE) may also contribute to PA levels in humans. An insertion/deletion polymorphism (rs1799752) in ACE determines the expression of ACE in human skeletal muscle which in turn regulates the levels of serum and tissue ACE, the production of angiotensin II, and the half-life of the bradykinin [37]. This same 287-base-pair insertion has also been linked to better endurance performance among elite distance runner [38], rowers [39], and mountaineers [40]. Thereafter, a study examining training-related changes in the mechanical efficiency of human skeletal muscles found that the presence of ACE insertion conferred an enhanced mechanical efficiency in trained muscles [41]. Lastly, a more recent candidate gene analysis reported that this polymorphism was also associated with the PA status in 355 untreated stage I hypertensives (inactive, 13.7% II genotype, 53.7% ID genotype, 32.6% DD genotype; active, 26.6% II genotype, 55.4% ID genotype, 18% DD genotype; P = 0.001) [42].

The melanocortin-4 receptor (MC4R) gene has been reproducibly associated with obesity and other anthropometric traits [43, 44]. This gene is highly expressed in the hypothalamic area and is known to act within the central nervous system profoundly influencing energy expenditure, body weight, and other human behaviors [45]. Interestingly, it is located in a region on chromosome 18, which was also identified by a linkage study for PA [26]. In a previous candidate gene analysis, Loos et al. found that the MC4R-C-2745T polymorphism was associated with moderate to strenuous PA (P = 0.005), PA status (P = 0.01), and time spent on PA during the past year (P = 0.005) in a combined sample of women and men [46].

Several additional genes may have been implicated in the regulation of human PA. For example, the gamma-aminobutyric acid type A receptor-gamma 3 (GABRG3) gene is located in a genomic region in chromosome 15, which has been previously linked to PA levels in a linkage study [25]. In addition, higher expression levels of this gene were observed after exhaustive exercise [47]. The potential role of GABRG3 in the regulation of human PA was also corroborated by the suggestive evidence from the GWAS [30••]. Genes that encode muscle structure proteins have also been considered PA-determining genes. The alpha-actin 2 (ACTN2) and the alpha-actin 3 (ACTN3) genes encodes two structural proteins that are associated with the size of muscles and the integrity of muscle fibers and have been further implicated in the regulation of contraction and degree of muscle strength [48,49,50,51]. An impact of sex hormones on PA levels has been previously observed [52], possibly mediated through the modulation of the androgen to estrogen ratio by aromatase. Interestingly, the cytochrome P450 family 19 subfamily A member 1 (CYP19A1) gene, which encodes aromatase, co-localizes with an established mouse PA-related QTL [53]. Although no quantitative trait loci (QTL) have been identified for PA in humans to date for this locus [54], strong evidence from animal models supports CYP19A1 as a PA-determining gene [52]. As discussed above, the functional relevance of the NHLH2, DRD1, and SLC2A4 has also been supported by evidence from animal models. However, additional association-based investigations are needed to confirm their roles in the regulation of human PA and to identify the causal variants.

Candidate genes for PA proposed to date appear to primarily involve pathways related to homeostasis, dopamine, and development and function of the musculoskeletal system (Table 2). Overall, the findings point to an important role of the nervous system as well as structural and metabolic changes in the muscle in influencing both the degree and the capacity to actually perform PA (Fig. 1).

Fig. 1
figure 1

Potential mechanisms that regulate human physical activity. The engagement of homeostasis (solid line) coupled with dopaminergic reward pathway (dotted line) may be responsible for the variation in human physical activity

Evidence from GWAS

Candidate gene studies have led to limited findings given the biological pathways leading to human PA remain largely unknown. GWAS can overcome the challenge of candidate gene studies by embracing an unbiased approach to discovery through association.

Only one GWAS to date has been conducted for human PA (Table 2). In 2009, De Moor et al. performed the first GWAS of PA based on a HapMap Imputation dataset with ~ 2.5 × 106 SNPs [30••]. By using a threshold of pooled P values < 1.0 × 10−5, the investigators identified a total of 37 novel SNPs in PAPSS2 and two intergenic regions (2q33.1 and 18p11.32). The lead SNP in PAPSS2, rs10887741, appeared to be associated with the regular participation in leisure time exercise (60 min per week; OR 1.32, 95% CI 1.17–1.49; pooled P = 3.81 × 10−6; pooled P adjusted for BMI = 6.26 × 10−6). As aforementioned, evidence from animal models [11] and a previous genome-wide linkage study [35] support PAPSS2’s candidacy as a determinant of PA levels and physical fitness, through effects on the development of the musculoskeletal system [33, 34].

Despite these findings, we note that the first GWAS for PA has two substantial limitations. First, the study overall had limited power to detect causal variants with modest effects (Dutch: n = 1644; US: n = 978). Second, the study population was restricted to adults of European ancestry. Thus, the findings may not be generalizable to other race/ethnic group who are disproportionately affected by the consequence of physical inactivity and concurrently underrepresented in genetic studies. Nevertheless, the first GWAS for human PA represents a critical initial step to galvanize future efforts to understand the genetic basis of human PA.

Current GAPS and Future Directions

Multiple lines of evidence from animal models, parent-offspring studies, twin studies, candidate gene analyses, and GWAS suggest a genetic component underlying human PA. We hasten to note, however, that the other two key questions proposed in the beginning of this review (to what extent PA levels are affected by the genetic factors; exactly, what genetic factors and biological pathways are involved in determining PA levels) remain largely unanswered.

For the past few decades, genetic studies on human physical activity have mainly focused on testing a small group of markers from genomic regions of possible functional relevance or implicated in previous studies. We now know that a majority of these studies were grossly underpowered and restricted in discovery by low-resolution mapping techniques [15•, 22]. Furthermore, the initial explorations of PA genetics were conducted in family-based setting, which, despite its own advantages, heavily relies on the inheritance structure of the pedigrees, yields only positional candidates, and is not effective in detecting genetic markers with subtle effects [55].

As we shift to association-based methods and larger sample sizes, we are now beginning to see GWAS for human PA with large panels of SNPs [30••]. This shift has helped the field of PA genetics to form and evolve, but significant gaps in evidence of PA genetics remain.

In addition to identifying genetic variants across the genome associated with PA through GWAS, it is important for investigators to overlay advanced systems biology-based methodologies to findings from GWAS. Such methods will likely further enhance the statistical efficacy to detect the regulatory genetic modules for human PA clustered in pathways and networks with biologically plausible connections [22]. Advanced pathway and network analyses have proven to be effective and efficient approaches to integrate data from GWAS and functional genomics in studying perturbations in biological pathways and gene networks in the etiology of other phenotypes and diseases, such as T2D and CVD [56]. Therefore, it is natural and logical to consider the possibility of applying such methodologies to the investigations of PA genetics.

Demonstrated in earlier parent-offspring studies and twin studies, the variation in human PA levels is most likely to be affected by the joint impact of both genetic and environmental factors (Fig. 2). While it is understandable that the field of PA genetics is currently focused on identifying causal variants with main effects that drive the variation in human PA, investigators should also not forget the potential of gene-environment interactions. Utilizing well-characterized environmental data from already established large-scale cohorts [57,58,59] provide a unique opportunity to identify mediating effects of environmental factors on the genetics-PA relation. Such gene-environment interaction studies will be essential to the further clarification of the mechanistic paradigm for the regulation of human PA. In addition, current evidence, without accounting for the complex gene-environment interaction, is primarily based on European-derived populations, which limits the generalizability and applicability of findings to minority populations.

Fig. 2
figure 2

Mechanistic paradigm of the gene-environment interactions for human physical activity. Environmental factors may mediate and interact with genetic effects in the regulation of human physical activity

Recent advances in GWAS and next generation sequencing have led to the identification of dozens of genetic variants that together could explain a significant amount of heritability for other phenotypes, such as body mass index (BMI) and T2D [60, 61]. Unfortunately, as pointed out earlier, there is generally a lack of evidence on PA from GWAS and GWAS-based meta-analyses compared to other phenotypes and diseases. Thus, it is not surprising that there has been a plea for large GWAS and collaborative efforts to help unravel the genetic pathways that affect this important health-enhancing behavior [62].

Rapidly evolving portable technology has made it possible to further explore the genetic basis of PA in a more in-depth and comprehensive manner. Wearable devices for monitoring and tracking PA have become increasingly available and affordable, which provide a unique perspective and opportunity to investigate PA genetics using objective PA measures in large study populations [63, 64]. The availability of whole-genome sequencing (WGS) data for PA genetic studies will undoubtedly shed additional light on its genetic component, especially those rare variants that cannot be captured by standard GWAS. The Trans-Omics for Precision Medicine (TOPMed) program, as part of the Precision Medicine Initiative sponsored by the National Institute of Health (NIH), is an ongoing project that seeks to uncover disease risk factors and develop more targeted and personalized interventions by integrating WGS and other Trans-Omics data with molecular, behavioral, imaging, environmental, and clinical data [65]. In addition, as the field of PA genetics keeps evolving, it is essential to further improve our understanding of the molecular changes linking genetic factors to human PA behavior and then to the development of chronic diseases related from insufficient PA. The NIH sponsored Molecular Transducers of Physical Activity in Humans Consortium (MoTrPAC) [66], among other collaborative initiatives, aims to assemble a systematic and comprehensive molecular map detailing the molecular signals and mechanisms that transmit the health benefits of PA and construct a user-friendly database that can be used by clinicians and researchers to develop and test specific hypotheses. PA phenotyping utilizing wearable devices and resources like the TOPMed and MoTrPAC will allow the investigators to integrate genetic data on both common and rare variants, molecular signals and pathways, and multidimensional phenotypic and demographic data, to inform public health interventions, clinical practice, and the development of related policies and guidelines.

The global pandemics of physical inactivity and related chronic diseases are not only public health issues but also economic issues. In order to alleviate the tremendous health and economic burden associated with physical inactivity, a new generation of PA genetic research is urgently needed. The aforementioned advances are poised to revolutionize the field of PA genetic studies, and to provide pivotal insight into the genetic factors driving humans to perform PA. Future better-designed preventive strategies, informed by PA genetics, systems biology, bioinformatics, and ultimately “precision medicine,” are expected to have major implications for the prevention, control, and treatment of chronic diseases related to physical inactivity.