Introduction

Currently, robust identification of genetic variants associated with exercise phenotypes is limited by a lack of reproducible results. Family and twin studies have estimated high heritability for various exercise performance metrics (e.g. muscle mass: 40%, anaerobic power: 70–80%, aerobic exercise: 50%) (Issurin 2017). However, a wide variety of environmental (e.g. diet, sleep), psychological, and epigenetic factors may also influence exercise responses (den Hoed et al. 2013). In addition, within-subject variability (i.e. the variable response of a given individual to the same exercise training) considerably limits the identification of genetic variants with potentially small effects on exercise response (Voisin et al. 2019).

To date, only two genetic signatures have consistently shown an association with exercise responses; the Alpha-actinin-3 stop gain variant (ACTN3) p.Arg577Ter and the Angiotensin converting enzyme (ACE) insertion/deletion (I/D) in intron 16 (Houweling et al. 2018; Yan et al. 2018). Genome-wide association studies (GWAS) have helped discern genomic loci associated with training response; however, these usually contain a low number of participants, and/or evidence of association with exercise psychology-related phenotypes (Bouchard 2011). Studies in metabolic and cardiovascular disorders such as diabetes or arterial hypertension have further complicated participant ability to perform exercise training at duration and intensity and as such participants can be classified as having exercise intolerant disorders (McCoy et al. 2017). As exercise training yields a host of health benefits, understanding which genetic and molecular processes contribute to these responses might be helpful to the development of personalised exercise therapeutics (e.g. exercise dosing to minimise risk of adverse response within exercise intolerant disorders) (Buford et al. 2013).

In this study, we investigated candidate genes previously implicated in exercise response in cohorts of varying fitness levels. We used a highly trained cohort (triathlon) and a moderately trained, longitudinal cohort of high-intensity interval endurance training (HIIT) exercise training. We hypothesised that many, if not all, of the candidate SNPs would be found to be associated with triathlon performance and response to 4 weeks of endurance exercise training, regardless of age or baseline fitness level.

Materials and methods

The Gene SMART cohort

The Gene SMART (Skeletal Muscle Adaptive Response to Training) study design has previously been described (Yan et al. 2017). The study is ongoing with currently > 100 moderately trained participants who were sampled for blood and skeletal muscle (vastus lateralis) at several time points: before, immediately after and 3 h after a single bout of high-intensity endurance exercise (HIIE), and after 4 weeks of high-intensity interval training (HIIT) (Yan et al. 2017). Exercise-related phenotypic measurements were collected before and after the completion of the exercise training intervention (e.g. lactate threshold (LT, in Watts), peak power output (PP, in Watts), maximal oxygen uptake (VO2max, in mL/min/kg body weight, from graded exercise tests), and a time trial measurement (TT, in min). All participants gave informed consent, and the study was approved by the Victoria University Ethics Committee (Approval number: HRE13-233). Subsequently, the study was also approved by the Queensland University of Technology (QUT) Human Research Ethics Committee (Approval number: 1600000342). All procedures performed in studies involving human participants were in accordance with the ethical standards of the respective institutions research committees, and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. At the time of collection, n = 77 participants had participated in the study, with n = 58 completing the entire 4-week training programme. Genomic DNA was extracted and purified from whole blood using the QIAamp DNA blood midi kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions for participants who completed the study. Samples that failed genotypic analysis or had a large amount of missing phenotypic data were removed from further analysis, leaving a final sample size of n = 52 (Age = 30.95 ± 8.17 years). In the moderately trained cohort (Gene SMART, n = 58), we focused on the response to an exercise training programme (longitudinal analysis). Specifically, we measured the change in (Δ = post − pre) measurement for each endurance fitness trait as a representation of response to exercise training.

Highly trained (Ironman) cohort

Ironman triathlons consist of a 3.86 km swim, a 180.25 km bike ride, followed by the completion of a full marathon (42.2 km). The 2008 Hawaiian Ironman Triathlon population has been previously described as an elite endurance cohort based on their eligibility and participation in the event (Grealy et al. 2013, 2015). Due to the intensity of this endurance event, only highly trained individuals that completed it were included in this study. To avoid genetic confounding, we analysed only the triathlon participants who self-identified as male and Caucasian (Age = 43.81 ± 11.39 years). This was performed solely on the triathlon group as the Gene SMART population was already homogeneously male. All procedures performed in studies involving human participants were in accordance with the ethical standards of the QUT human Research Ethics Committee (approval number: 1300000499), and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Saliva samples (OG-250 Oragene Kit, DNA Genotek Inc.) and questionnaires were collected prior to the event; time-to-completion measurements for each event were collected from the publicly available online event webpage. Genomic DNA was extracted as per manufacturer instructions and described previously (Grealy et al. 2013). In the highly trained cohort (Ironman, n = 115), we focused on endurance performance, the result of months or years of training (cross-sectional analysis). Specifically, we used the time-to-completion of the running event, the biking event, the swimming event, and the total event.

Genotype method and SNP selection

The SNPs investigated in this study (Table 1) were included based on conformity of one of three criteria. The first was that SNPs chosen had to have been previously associated with elite athletic status, exercise responses with reasonable replication, or exercise traits at baseline. This resulted in 11 SNPs chosen, though it should be noted that we were unable to genotype the ACE I/D variant (rs4340) using the MassARRAY and previous work failed to identify an association with baseline fitness levels in the Gene SMART cohort (Yan et al. 2018). The second criteria encompassed SNPs previously investigated but less consistently associated with performance, i.e. studies with equivalent numbers of negative studies or studies related to exercise psychology. The third criteria included SNPs associated with exercise intolerant disorders and non-exercise respiratory, muscular, or energy storage phenotypes such as hypertension (HT), cardiovascular disease (CVD), or type 2 diabetes mellitus (T2DM). The experimental methodology for sample preparation and genotype analysis was performed using the Agena Biosciences MassARRAY, a matrix-assisted laser desorption ionisation time-of-flight (MALDI-TOF) mass spectrometer, which has been described elsewhere (Jurinke et al. 2002; Ellis and Ong 2017). An internal genotyping control SNP (rs17602729, AMPD1), previously validated in our endurance cohort, was used to ensure the MassARRAY system correctly identified genotypes (Grealy et al. 2015).

Table 1 Details on the 36 SNPs included in the custom MassARRAY genotyping assay

Data processing

The output files from the MassARRAY platform were converted to PLINK format and analysed for correct genotypic identification (calling). For the Gene SMART and Ironman populations, respectively, SNPs were excluded from further analysis if they exceeded the following thresholds: (1) SNPs that had a calling rate < 80% (> 20% missing data) (n = 5, n = 3); (2) SNPs with a minor allele frequency < 2% (n = 1, n = 2); (3) SNPs determined not in Hardy–Weinberg equilibrium (n = 1, n = 1) (Ellis and Ong 2017). Subsequent analysis was performed on n = 29 SNPs for the Gene SMART population and n = 30 SNPs for the Ironman population.

Statistical analysis

We measured normality metrics (skewness and kurtosis) for each phenotype in both populations using the ggplot2, tidyverse and moments packages in R, to determine if data transformation was necessary from the raw phenotypic values. We used PLINK V1.90p to perform quantitative linear association tests (95% CI) with both dominant and recessive models for each cohort, adjusting for age. An additive model was considered but did not differ from the results obtained from the dominant model. As this was a candidate gene study, SNPs that had a raw p value < 0.05 were considered nominally significant, while variants that had an adjusted p value (Benjamini–Hochberg false discovery rate (FDR)) < 0.05 were considered significant. This adjustment method represents a good balance between type I and type II errors and as such minimises false positive results. To avoid multiple testing burdens with phenotypic traits, we used a separate hypothesis for each quantitative trait. Effect sizes were determined using raw beta regression coefficient values interpreted as “how much a specific phenotype increased for each additional X allele at the SNP of interest”.

Results

The array genotyping control (AMPD1) was identified to be 100% concordant with the genotyping results from another method (RFLP) in our previous study with the same population, confirming the validity of the MassARRAY data (Table 2).

Table 2 Summary of nominally significant variants associated with gains in endurance fitness after exercise training in the Gene SMART cohort

Six variants in five distinct genes were nominally associated with time-to-completion of Ironman events: Nuclear Respiratory Factor 1 (NRF1: rs6949152), Myostatin (MSTN: rs1805086), Major Histocompatibility complex class 1A (HLA-A: rs1061235), Peroxisome proliferator-activated receptor gamma coactivator 1-alpha (PPARGC1α: rs6821591, rs6821591), and SH3 Domain GRB2 Like Endophilin Interacting Protein (SGIP1: rs9633417). The results of our association testing did not change significantly when age was used as a covariate.

In the Gene SMART cohort, eleven variants in nine distinct genes were shown to be nominally associated with gains in endurance fitness following exercise training (Table 3): Adenosine Monophosphate Deaminase 1 (AMPD1: rs17602729), Iodothyronine Deiodinase 1 (DIO1: rs2294512), Bradykinin receptor B2 (BDKRB2: rs1799722), Nuclear Respiratory Factor 2 (NRF2: rs7181866, rs8031031), (COL6A1, rs39796750), Apolipoprotein E (APOE: rs7412), Interleukin 6 (IL6: rs1474347), Mitochondrial uncoupling protein 2 (UCP2: rs660339), and Homeostatic Iron Regulator (HFE: rs1799945).

Table 3 All nominally significant variants associated with different triathlon event finishing times in the highly trained endurance cohort

Interestingly, no variants were identified to be significantly associated with both time-to-completion in the Ironman cohort and the response to endurance exercise training in the Gene SMART cohort. Only rs1474347 in IL6 passed correction for multiple testing using the BH-FDR method (FDR: 0.018). The C allele at rs1474347 was associated with VO2max response within the Gene SMART study with an effect size of − 4.016 mL/(kg min).

Discussion

In the present study, we have successfully replicated previously associated exercise-related SNPs using the combined data from highly trained and moderately trained cohorts. Our main findings identified the rs1474347 in the IL6 gene to be significantly associated with gains in VO2 max in the Gene SMART cohort after multiple testing statistical corrections. In addition, 17 genetic variants were found to be associated with either elite performance or responses to exercise; however, none of these variants were common between these cohorts.

Different genetic signatures likely confer different responses to exercise training via specific molecular pathways. Therefore, variants that influence pathways responsible for adaptation to moderate training may in part differ to those that confer response to high-intensity endurance training. Additionally, moderately trained cohorts typically contain individuals with large variability in environmental factors such as diet, sleep, and habitual physical activity patterns, while the inter-individual variability in these measures is smaller in highly trained cohorts and therefore less likely to confound results (Bonetti and Hopkins 2010).

Association between genetic variants and exercise responses in the Gene SMART cohort

Located in an intron of the IL6 gene, the rs1474347 variant has been previously associated with T2D traits in a large-scale study (n = 10,775). The IL6 protein is a pro-inflammatory cytokine with myokinetic (i.e. excreted from skeletal muscle) functions and is responsible for triggering and maintaining immune processes following post-exercise muscle damage (Munoz-Canoves et al. 2013). We found that the C allele at this locus negatively affected the exercise response to the VO2max phenotype (β = − 4.016 mL/kg min) and therefore a homozygous C/C genotype would result in a VO2max loss of − 8.032 mL/kg min. The rs1800795 coding variant within the IL6 gene has shown mixed evidence of exercise associations, i.e. variant C = athleticism, G = power (Ahmetov and Fedotovskaya 2015; Karanikolou et al. 2017). Interestingly, further analysis identified the rs1474347 C allele to be in strong linkage disequilibrium (LD) with the C allele of rs1800795 (R2 = 0.96). As such, it is feasible that the LD identified between these variants has contributed to the mixed evidence reported for association studies implicating the latter variant (rs1800795) in IL6 for exercise traits. We propose that the rs1474347 variant may reduce the expression of IL6 during acute muscle damage and therefore cause a reduced local immune response leading to loss of skeletal muscle remodelling and repair. In addition this variant is also located 2 kb upstream of an uncharacterised long non-coding RNA (lncRNA; LOC541472) and therefore, variants in this region may affect the IL-6 pro-inflammatory pathway or post-translational epigenetic and regulatory processes.

Association between genetic variants and Ironman performance

The run time (42.2 km marathon) and bike time (180.25 km ride) events in the triathlon are largely leg-based exercise activities, and therefore we expected a significant overlap of variants associated with these traits. In contrast, the triathlon swimming event utilises whole body muscle groups; therefore, SNPs seen in this test were anticipated to only be associated with this particular trait. Our findings supported this as the variants associated with the swim time event were not seen in either of the other isolated finishing times, or indeed the total time-to-completion trait. This is also supported by the current literature where elite runners and swimmers are not analysed collectively (Woods et al. 2001). We also note that the two variants nominally associated with the swim time trait are involved in hypoxic events characteristic of swimming (Galy et al. 2008). It is possible that the rs6949152 G allele within the NRF1 gene results in lower activity of NRF transcription factor and therefore increased levels of hypoxia-inducible factor 1 alpha (HIF1α). This would cause reduced oxidative metabolic processing and therefore lead to the increase in swim time that is associated with the variant (β = 0.2459 h). Additionally, the CKMM protein has been shown to exhibit protective effects during mild hypoxia and therefore hypothesised that the rs8111989 variant would increase the functionality of the CKMM protein, resulting in protection against re-oxygenation-induced muscle damage and decreased swim time (Zervou et al. 2017).

Although multiple SNPs examined in this study passed our nominal threshold for significance, which was unexpected given our relatively small sample sizes, all variants nominally significant in each cohort have previously been investigated as causative variants in multiple exercise studies.

Using a MassARRAY design of 36 SNPs, we found a significant association for the rs1474347 SNP in IL6 with the change in VO2max trait in a cohort of moderately trained individuals. Furthermore, 16 other SNPs were shown to have nominal association with exercise response in the Gene SMART cohort, or Ironman performance in highly trained athletes. As such, these markers may be useful in the development of tailored genetic panel screening and therapeutics in sports science, and exercise intolerant disorders. However, to more fully exploit their applicability in this context, confirmation of the genotypic phenotype on gene function is required. While this is outside the purview of this study, we have successfully replicated the significance of several exercise genes in two relatively small exercise study cohorts through nominally significant associations identified in the study cohorts. We were also able to implicate and ascertain directionality of SNPs between the different phenotypic traits. Additionally, the different variants associated with each cohort highlight the need to examine multiple cohorts of differing fitness levels and training capabilities. However, more replication studies are required in conjunction with functional transcriptomic/proteomic studies to confirm the genes and pathways associated with exercise adaptations. The use of multi-centre studies and consortia, such as the Athlome study consortium, would be helpful to better facilitate these efforts to further develop the field of exercise genomics research (Pitsiladis et al. 2016).