Introduction

Metabolomics is increasingly used to obtain a detailed molecular view of normal physiology as well as of the pathology of metabolic diseases like obesity [1,2,3,4]. This may allow identification of metabolites as biomarkers capable of serving alongside traditional clinical markers for predicting metabolic health [5, 6].

Some studies indicate that body composition, rather than body weight [7] or body mass index (BMI) [8, 9], predicts metabolic health, while others have shown that BMI outperforms [10, 11] or is equal to more detailed adiposity measures in identifying metabolic risks [12]. It may well be that the effectiveness of BMI in describing metabolic health leans heavily on the premise that anthropometric measures are correlated with more direct measures of adiposity such as body fat percentage (%bodyfat) [13], subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT) [14], or liver fat (LF) [15]. However, it is recognized that BMI does not capture subtle differences in body composition and cannot locate fat distribution [14, 16].

In addition to BMI, the distribution of fat in obesity is an important correlate for metabolic health; VAT [17, 18] or LF [19], rather than overall adiposity, contribute to unhealthy circulating metabolite profiles. In particular, VAT associates with elevations in branched chain amino acids (BCAA) [20, 21] and LF with increases in low-density lipoprotein cholesterol (LDL-C), fatty acids (FA), inflammatory markers, and BCAA [19]. VAT has also been independently linked to insulin resistance (IR) [5, 6] and dyslipidemia [6, 16]. Viscerally obese subjects with hyperinsulinemia also present with low high-density lipoprotein cholesterol (HDL-C) [22]. Consequently, the question remains as to whether BMI, as an adiposity measure, may not be enough to accurately gauge metabolic health of an individual, thereby requiring detailed body composition measures (SAT, VAT, and LF).

Monozygotic (MZ) co-twins share 100% of their genomic sequence and are perfectly or closely matched for many environmental factors. Studies on within-twin pair differences are free from genetic and shared environment confounding and highlight environmental effects not common to both twins in a pair (e.g., aspects of diet, exercise, and lifestyle) as a basis to explain individual differences within MZ twin pairs [23]. This study design is particularly useful when studying complex traits. Previous studies comparing fat distributions and cardiometabolic health measures between unrelated individuals are hampered by the fact that deposition of fat is at least moderately heritable, as are blood biochemistry measures. The heritability estimates for VAT range from 42 to 57%, for SAT from 36 to 42% [24, 25], and for LF around 37% [15]. To this end, the metabolic processes that result from differences in fat distribution between individuals may also be heritable. A genome-wide association study of 8330 Finnish individuals, including 561 twin pairs, has shown substantial twin-based heritability estimates of circulating metabolites, varying between 23 and 76% depending on the compound [26]. Genetic factors also strongly determine cardiometabolic health risk factors such as cholesterol values with heritability estimates of plasma cholesterol and triglycerides (TG) between 56 and 77% [27, 28].

Although metabolite profiles in obesity have been investigated [1, 2], we know of only one study relating the metabolite profiles with all main fat compartments in young, healthy individuals [19]. Also, because most of these studies compare metabolites between unrelated obese and lean groups [1, 4] or focus on older individuals [3], very little information exists on early predictors which link to metabolic health in young, healthy individuals who have not yet developed the distinctions that drive these groupings of metabolic health. Little is also known about how different metabolic risk factors relate to metabolic health disturbances on the molecular level. Consequently, we sought to look at the metabolite levels from the viewpoint of different metabolic risk factors including accumulation of fat in different depots and traditional blood biochemistry markers. We employ a co-twin control study design using young adult MZ twin pairs to identify metabolites significantly associating with metabolic health. We first identified metabolite patterns that associate with different fat depots and selected cardiometabolic risk factors including cholesterol levels, IR, and inflammation markers. Because there is no consensus on which cardiometabolic risk factors are more important than the others [29], we compared a range of the phenotype-metabolite associations to reveal the relative importance of each of the phenotypes. In cases where associations with metabolites were shared across phenotypes, we compared the effect sizes of these associations to highlight those phenotypes that capture the largest metabolite level change in circulating blood. We also examined, via within-twin pair analysis, if these associations are independent of shared genetic and environmental influences. Lastly, we assessed if metabolite profiling can be used to identify subgroups that differ in cardiometabolic risk factors. The identification of subgroups would indicate that the heterogeneity in cardiometabolic risk factors among the twin individuals can be refined by metabolite screening.

Materials and methods

Subjects

We studied 80 MZ twins (36 males, 44 females, 40 twin pairs) from two longitudinal population-based Finnish twin cohorts, FinnTwin16 (born between 1975–1979, n = 2839 pairs), and FinnTwin12 (born between 1983–1987, n = 2578 pairs) [30]. The participants have been described previously [31]. In brief, the mean BMI was 27.9 kg/m2, and mean age 30.7 years at the time of the study (Table 1). The twins were healthy, with the exception of one twin with inactive ulcerative colitis (treated with mesalazine and azathioprine), and another twin with type 2 diabetes (treated with metformin and insulin). All participants provided written informed consent. The study was performed according to the principles of the Helsinki Declaration with approval from the Ethics Committee of the Helsinki University Central Hospital.

Table 1 Clinical characteristics of the 80 individuals

Clinical measures were obtained as described previously [31]. We used dual-energy X-ray absorptiometry (DEXA) [32] to determine body composition including %bodyfat, magnetic resonance imaging (MRI) to measure body fat distribution to abdominal SAT and VAT, and proton magnetic resonance spectroscopy (1H-MRS) to determine LF [33]. We also measured fasting lipids (total cholesterol [TCHOL], HDL-C, LDL-C, and TG) and high-sensitivity CRP [31]. Additionally, we measured fasting glucose and insulin [31] to calculate IR by the homeostasis model assessment (HOMA). TCHOL, HDL-C, LDL-C, TG, CRP, fasting glucose, and insulin measures are collectively referred to as blood biochemistry measures in this article. The study participants disclosed their smoking behavior via questionnaire.

Metabolomic analysis

We collected plasma samples after 10–12 h fasting. Metabolites were analyzed on an Acquity Ultra (High) Performance Liquid Chromatography (UPLC)-Mass Spectrometry (MS) system coupled to a Xevo® TQ-S triple quadrupole mass spectrometer (Waters Corporation, Milford, MA, USA). Briefly, 10 µL of internal standard mixture was added to 100 µL of samples. Extraction solvent was added and the collected supernatant dispensed in OstroTM 96-well plate (Waters Corporation, Milford, USA). Then, 5 μL of vacuum-filtered sample extract was injected into UPLC-MS for metabolite separation and quantification. MassLynx 4.1 software was used for data acquisition, data handling, and instrument control. Data processing was done using TargetLynx software. Complete instrument parameters are available here [34]. The metabolites were selected such that they cover 15 different classes in a single analytical method with short running time.

We removed metabolites with missing values in any sample, and were left with 93 out of the initial 111 metabolites (Supplementary Table 1). The non-normally distributed metabolite data was rank transformed to a standard normal distribution with a mean of zero and variance of one (package GenABEL in R) [35].

Statistics and bioinformatics methods

First, we computed the associations between each phenotype (adiposity and blood biochemistry measures) and each metabolite across all individuals (n = 80). For this regression analysis, we used a linear mixed model (package lme4 in R) [35] to account for the family relationships. We identified the size of the effect of each phenotype on the metabolite by extracting the regression coefficient and p-value for each association. This regression coefficient represented the unit change in metabolite for each unit change in phenotype measure. For metabolites significantly associated with one or more phenotypes, we compared the standardized beta coefficients (in units of standard deviation) in order to see which associations had stronger effect sizes.

Next, to identify associations independent of genetic and shared environmental effects, we performed within-twin pair moderated t-tests separately for each phenotype (package limma in R) [35] to identify metabolites with significantly different concentration within-twin pairs. p-values were FDR-adjusted for multiple testing [36]. We adjusted both the regression and the moderated t-test models for age, sex, smoking, and experiment batch number.

Lastly, we assessed whether individuals form separate subgroups based on their metabolite profiles. We first obtained residuals from metabolites modeled using a linear mixed model. Using the resultant residuals after controlling for effects by age, sex, smoking, family, and batch number, we performed principal component analysis to capture variance in the adjusted data while accounting for collinearity between the metabolites. Using the 20 principal components that captured 80% of the variance in the data, we performed k-means clustering to cluster the individuals according to their metabolite profiles. We compared the metabolite concentrations between the subgroups using moderated t-tests (package limma in R). We also performed Welch t-test (for comparison of means in groups of unequal sample sizes, package psych in R) to determine if the phenotypes of these groups were significantly different. Normalized phenotype data was used and the model adjusted for age, sex, smoking, family, and the experiment batch effects. Lastly, we used partial least squares discriminate analysis (PLSDA) to simultaneously model the metabolites and each phenotype such that the linear combinations of the metabolite variables are in maximum covariance with the phenotypes [37]. We performed the PLSDA analysis using, as outcome variables, phenotypes that were significantly different between groups.

Metabolite functions were queried from the Human Metabolome DataBase (HMDB) database. We used Ingenuity Pathway Analysis (QIAGEN, Redwood City, California) to identify pathways represented by the metabolites that differed between the subgroups. Pathways with Fisher’s exact test p-value < 0.05 were considered significant.

Results

Clinical characteristics including adiposity and blood biochemistry measures for all the 80 twins are presented in Table 1. The 93 metabolites and their mean concentrations among the twin sample are listed in Supplementary Table 1.

Associations between metabolites and adiposity measures

We first regressed 93 metabolites against adiposity measures (BMI, %bodyfat, SAT, VAT, and LF). We identified 91 associations (p < 0.05) between metabolites and adiposity measures. Of these, 18 remained significant (p < 0.05) after FDR adjustment (Fig. 1 and Table 2). The 18 associations were formed between nine metabolites and the adiposity measures. Aspartate associated significantly with all the adiposity measures, propionylcarnitine with all measures but LF, deoxycytidine with BMI, SAT and VAT, and tyrosine with BMI and VAT. In addition to these four metabolites, BMI showed unique significant associations with isobutyryl carnitine, cysteine, tyrosine, and valine. SAT associated with three metabolites that also associated with VAT; VAT associated with one additional metabolite. All metabolites uniquely associating with each phenotype and shared between phenotypes are shown in Fig. 1.

Fig. 1
figure 1

Metabolites significantly (FDR < 0.05) associated with one or more phenotypes. The color intensity of the individual rectangles shows the value of the standardized regression coefficient (effect size) while the asterisks indicate if the associations are free from genetic and shared environmental confounding. The gray squares indicate non-significant associations. CRP high-sensitivity C-reactive protein, HDL-C high-density lipoprotein cholesterol, HOMA homeostatic model assessment, LDL-C low-density lipoprotein cholesterol, LF liver fat, SAT subcutaneous adipose tissue, TCHOL total cholesterol, TG triglycerides, VAT visceral adipose tissue

Table 2 Metabolite-phenotype associations, and within-pair differences in the metabolite concentrations

Among the three phenotype-associating metabolites shared between the adiposity measures, propionylcarnitine and deoxycytidine showed the largest effect sizes for BMI, while among the adiposity measures, aspartate showed the largest effect size for LF (Table 2, Fig. 1).

To assess if any of the significant associations between the metabolites and adiposity measures were confounded by genetic or shared environmental factors, we performed within-twin pair comparisons using all 40 twin pairs. Half of the significant associations remained, implying that these associations are independent of confounding by factors shared by the co-twins, and are rather due to unique environmental factors experienced by the individual twins (Table 2).

Associations between metabolites and blood biochemistry measures

Regression analysis of the 93 metabolites against blood biochemistry measures (TCHOL, HDL-C, LDL-C, TG, HOMA, and CRP) revealed 106 associations (p < 0.05), of which 24 metabolite-phenotype associations remained significant (p < 0.05) after FDR adjustment. HDL-C associated negatively with 15 metabolites while TG, CRP, and HOMA associated positively with two or more metabolites (Table 2, Fig. 1). A total of 14 of the 15 HDL-C associated metabolites were unique to HDL-C, and not shared with any other blood biochemistry measure. HOMA, HDL-C, TG, and CRP mostly had unique associations with metabolites except for S-Adenosyl-L-Homocysteine which associated with both HOMA and TG and propionylcarnitine which associated with both HOMA and HDL-C.

In the within-twin pair analyses (Table 2), 10 of the 24 associations (all with HDL-C) remained significant (FDR corrected p < 0.05), demonstrating that these observed associations exist after controlling for genetic influences and shared environmental factors.

Shared metabolite associations between adiposity and blood biochemistry measures

Six metabolites, namely, aspartate, tyrosine, deoxycytidine, hexanoylcarnitine, propionylcarnitine, S-Adenosyl-L-Homocysteine were significantly associated with two or more phenotypes (Table 2 and Fig. 1). All the shared metabolite associations were negative for HDL-C, and positive for HOMA, TG and all adiposity measures, as expected due to the opposing effects of HDL-C and the other clinical measures on metabolic health. Propionlycarnitine and deoxycytidine, found to be associated with BMI, SAT, VAT, and HDL-C, were highly correlated with each other (r > 0.7) (Supplementary Fig. 1).

In general, when a metabolite associated with adiposity and blood biochemistry measures, the association was strongest (based on the unstandardized effect size) for the blood biochemistry measures. The same was true when assessing the significant (FDR < 0.05) log fold changes in the within-twin pair analyses.

Out of the 42 significant associations between the metabolites and the phenotypes, for 13 associations the absolute effect size exceeded 0.6. All these metabolites associated with HDL-C. Out of the three metabolites associating with HDL-C that also associated with at least one adiposity phenotype, the effect size (standardized) with HDL-C was always the largest (Table 2, Fig. 1) except for propionylcarnitine. HDL-C, followed by BMI, also had the most number of uniquely associated metabolites (Fig. 2).

Fig. 2
figure 2

Explained variation in metabolites and outcome variables by the (Orthogonal Partial Least Squares) OPLS components. a PLSDA plot showing separation between both the two subgroup as well as TCHOL values. Here, six OPLS components explain 53% of the variation in the metabolite concentrations and 80% of the variation in both the subgroup membership and TCHOL values. b PLSDA plot showing separation between both the two subgroup as well as LDL-C values. Here, six orthogonal PLS components explain 53% of the variation in the metabolite concentrations and 80% of the variation in both the subgroup membership and LDL-C values. LDL-C low-density lipoprotein cholesterol, PLSDA Partial Least Square Discriminate Analysis, TCHOL total cholesterol

Clustering based on the metabolite profiling

Next, we explored the variances of each metabolite across the samples in order to cluster the individuals according to their metabolite profiles. We found two subgroups (clusters) of individuals differing by 32 metabolites (p < 0.05) (Supplementary Table 2). The concentrations of all these metabolites were higher in Subgroup 2 compared with Subgroup 1. Acylcarnitines and amino acids valine, leucine, isoleucine, and phenylalanine were the most prevalent metabolites differing between the subgroups. Pathway analysis for the differing metabolites revealed 14 significant pathways responsible for amino acid metabolism and cell signaling (Supplementary Fig. 2). We then performed group comparison analysis to determine if the two groups differ by their clinical phenotypes. We found that Subgroup 2 had significantly higher TCHOL and LDL-C compared to Subgroup 1 (Table 3). Additionally, we performed PLSDA analysis to identify metabolites that were important predictors for the subgroup membership and clinical phenotype. Using PLSDA, 10 PLS components captured 50% of the variation in metabolite concentration correlating with 80% of the variation in subgroup membership and either TCHOL or LDL-C (Fig. 2a, b). A total of 42 metabolites predicted subgroup membership correlating with TCHOL values and 45 metabolites predicted subgroup membership correlating with LDL-C values. In both groups of metabolites, amino acids and acylcarnitines featured prominently (Table 4).

Table 3 Characteristics of each subgroup’s individuals
Table 4 Important metabolites for predicting the subgroup membership in addition to phenotype

Discussion

BMI associated positively with more metabolites than any of the other studied adiposity measures (%bodyfat, SAT, VAT, and LF) in this study. HDL-C was negatively associated with the most number of metabolites, and showed the strongest effect sizes. Half of the associations remained significant after controlling for genetic and shared environmental influences within the twin pairs. Metabolic profiling clustered the subjects into two distinct subgroups, which differed by 32 metabolites, as well as by total and LDL-C.

We first identified metabolites that significantly associated with different adiposity measures in individual twins. Many of these adiposity-associated metabolites were amino acids followed by acylcarnitines. BMI is often used as a measure of obesity and an indirect measure of body fat. In accordance with this, we found that two out of the three metabolites associating with body fat percentage also associated with BMI, with higher effect sizes. There were overlaps in the metabolites associated with each of the fat depots and BMI, reflecting known high correlations between the different adiposity measures [38]. Our findings of several phenotype-associating metabolites commonly shared among the adiposity measures are consistent with these observations. Several studies on plasma metabolites have identified positive associations between BCAA and aromatic amino acids (AAA) with BMI [39, 40], being in line with our findings of BCAA valine and AAA tyrosine associating with BMI.

We also identified metabolites significantly associated with different blood biochemistry measures. Out of six blood biochemistry measures, HOMA, HDL-C, TG, and CRP associated with one or more metabolites. HDL-C had associations with the most number of metabolites; whenever these metabolites also associated with other phenotypes (both adiposity and blood biochemistry measures), the effect size of the association with HDL-C was always higher. HDL-C also associated with the most number of metabolites not associated with any other measure. Low serum HDL-C has previously been shown to associate with obesity and high risk for metabolic disease [41]. We propose that HDL-C is an important measure because of its ability to indirectly measure the most comprehensive metabolite profile changes, even when the differences in metabolite concentrations are small. Previous studies have also found strong negative correlation between HDL-C and central obesity characterized by VAT [42, 43]. While our findings agree with a previous study [44] and highlight HDL-C as an important predictor of metabolic health, further studies are still needed to assess if metabolites associated with HDL-C are independent or somewhat related to the increase in VAT in obesity. HOMA and TG associated positively with S-Adenosyl-L-Homocysteine, a metabolite found to be increased in vitamin B12 deficiencies and to alter adipocyte functionality through epigenetic mechanisms [45]. S-Adenosyl-L-Homocysteine treatment for cultured 3T3-L1 adipocytes resulted in reduced glucose uptake and lipolysis for the cells [45], but how plasma levels of this metabolite relate to adipose tissue function in vivo remains to be studied.

The highest number of significant metabolite associations were found for BMI and HDL-C. BMI and HDL-C associated with several amino acids (both AAA and BCAA) and acylcarnitines in the current study. Associations of BCAAs, AAAs, and acylcarnitines with body weight and IR have already been established in several studies [46, 47]. Newgard et al. [4] found a metabolic signature made out of AAA, BCAA, and acylcarnitines associating with metabolic disease [46]. More specifically, downregulation of adipose tissue mitochondrial catabolism of BCAAs in obesity is a consistent finding [31, 48]. Low mitochondrial oxidation in the tissues could explain why the BCAA levels are increased in the plasma of obese individuals. BCAA catabolism is mainly a mitochondrial event, and it is known to decrease in muscle [49], liver [50], and adipose tissue [31, 48] in obesity-related conditions, therefore resulting in accumulation of the BCAAs in the bloodstream. In obesity, the incompletely oxidized degradation products from BCAAs, the short-chain acylcarnitines, are also elevated in plasma [4, 40]. In other words, both the increase in circulating acylcarnitine levels and amino acids observed in our study may be indicative of the incomplete BCAA oxidation process.

Across all the measured metabolites, propionylcarnitine, aspartate, tyrosine, deoxycytidine, S-Adenosyl-L-Homocysteine, and Hexanoylcarnitine associated with two or more adiposity or blood biochemistry phenotypes. Propionylcarnitine associated negatively with HDL-C and positively with HOMA and all the adiposity measures, being in line with previously reported associations with BMI [4, 40]. Propionylcarnitine is generated in order to decrease plasma propionyl CoA produced from the catabolism of BCAA and odd-chain FAs [46]. Thus, the increased propionylcarnitine levels observed in the current study may indicate the body’s attempt to lower the circulating BCAA and FAs in plasma [51]. Aspartate was found to positively associate with TG and all adiposity measures in our study. While not much is known about aspartate in metabolic health, it does play a major role in nucleotide and protein biosynthesis [52] as well as oxidative phosphorylation in mitochondria which was previously shown to be impaired in obesity [53]. The positive association between tyrosine and BMI, and negative association between tyrosine and HDL-C observed in the current study are in line with a previous study linking tyrosine to IR and obesity [4]. Deoxycytidine was positively associated with BMI, SAT, and VAT, negatively with HDL-C. Deoxycytidine has multiple functions in DNA synthesis and repair, as well as in cellular growth and proliferation [54].

HOMA associated with three metabolites, two were acylcarnitines. Acylcarnitines in mitochondria have previously been identified as biomarkers of IR [47]. Whether impaired insulin or glucose function drives the increased levels of acylcarnitines or vice versa, is unknown [55, 56]. In the within-twin pair analysis, HOMA-metabolite associations disappeared, suggesting that HOMA is highly heritable. Heritability for HOMA has previously been estimated around 0.78 [57].

It is noteworthy that there were no metabolites that significantly associated with TCHOL and LDL-C, which are important measures related to metabolic health. Especially high LDL-C is often seen as an important independent risk factor for atherosclerosis and cardiovascular disease [58]. The absence of significant metabolite associations for these metabolic health-related phenotypes could be because our twins were relatively healthy, or because of low statistical power. Despite the lack of associations between the individual metabolites and these metabolic health measures, we show that clustering by metabolite profiling can identify subgroups of individuals that differ for their metabolic health. We identified two distinct groups that differ by LDL-C and TCHOL values. The group with higher LDL-C and total cholesterol values also showed higher concentrations of 32 metabolites compared with the other group. Circulating acylcarnitines made up one third of these 32 metabolites. There were also higher amounts of all three BCAAs and AAA phenylalanine, which were also captured as pathways responsible for amino acid metabolism and cell signaling in this group. Lastly, we identified 42 metabolites that were predictors for classifying individuals into subgroups according to the TCHOL values and 45 metabolites that were predictors for classifying individuals into subgroups according to the LDL-C values. While our sample size is small, this is an important step in identifying metabolites as classifiers for our two subgroups of differing metabolic health.

In the within-twin pair analysis, most of the relationships between the phenotypes and metabolites remained. Because MZ co-twins are genetically identical at the sequence level and share many environmental factors, significant within-twin pair differences in metabolite levels point to environmental factors for which the twin pair is discordant as an underlying contributor for the differing metabolite levels. It has been previously shown that BMI, HDL-C, TG, as well as BCAA, AAA, and acylcarnitines change by the effect of nutrition and exercise [59]. The present study adds to this by showing a fairly large number of specific phenotype-associated metabolites differing within the twin pairs, suggesting that these associations are not driven by genetic factors but rather by the unique environment, such as diet. However, in a prior study including a subset of these twins, we did not identify any significant within-twin pair differences in food intake or physical activity [31]. Further research is needed to identify the specific environmental component driving these differences. Metabolite differences between individuals with the same genotype could well be because of differences in their epigenetic profiles. Epigenetic marks react to environmental effects and thereby mediate their effects on the function of the genome. Therefore differing environments within MZ twin pairs may have produced epigenetic differences, which may have resulted in the differing clinical phenotypes or metabolite levels observed in this study.

MZ co-twin studies provide perhaps the best controlled study design available in humans owing to the complete or close match for genes, age, gender, and intrauterine and childhood environment. The germline genetic background is identical in each pair, hence acquired factors must account for the phenotype discordance [60]. However, while we can rule out genetic and shared environmental factors from the observed associations in this study, the cross-sectional design does not allow us to determine cause and effect. Another limitation of our study is the limited number of metabolites assessed. However, the targeted platform we used allows (semi) quantification of the identified metabolites while platforms with untargeted approach yield only qualitative data.

To summarize, we established that associations with HDL-C had stronger effect sizes; HDL-C also associated with the most number of metabolites. Measuring HDL-C, the most comprehensive and sensitive phenotype in our study, is sufficient for capturing even small changes in metabolite levels and is hence a suitable detector of early disturbances in metabolic health. Metabolite profiles consisting of several metabolites also proved to be capable of discerning different categories of metabolic health. To conclude, metabolites, either individually or as part of a larger profile, are useful biomarkers for metabolic health.