1 Introduction

Substantial evidence indicates that a high body weight—as measured by body mass index (BMI)—is a major risk factor for several chronic diseases, including cardiovascular disease and cancer (Poirier et al. 2006; Chan et al. 1994; Renehan et al. 2008). The biological mechanisms linking a high BMI to these diseases are not well-understood, however (Poirier et al. 2006; National Cancer Institute 2013). Elevated levels of cholesterol, triglycerides, inflammation, growth factors (e.g. insulin) and sex steroid hormones explain some of the increased disease risk related to a high BMI, but undiscovered metabolic factors may also play a major role (Poirier et al. 2006; Roberts et al. 2010; Chan et al. 1994; Renehan et al. 2008).

New, high-throughput metabolic profiling technologies, such as metabolomics, could facilitate discovery of BMI-related metabolic factors that are important for disease risk. Investigators have recently used metabolic profiling to identify amino acids associated with an elevated BMI (Newgard et al. 2009; Cheng et al. 2012; Gaudet et al. 2012) and levels of these amino acids were later found to predict future diabetes risk (Wang et al. 2011; Floegel et al. 2012). Overall, however, few studies have examined the metabolic correlates of BMI and they were small (fewer than 150 participants) (Newgard et al. 2009) or assessed fewer than 60 metabolites (Cheng et al. 2012; Gaudet et al. 2012). Many prior studies have, of course, used targeted assays to examine metabolite-BMI associations. But, because these studies examined one or only a few metabolic factors, they could not address whether their findings were simply due to confounding by other BMI associated metabolic factors.

A broader examination of the metabolic correlates of BMI—one that includes both a large sample size and hundreds of metabolites—may identify novel biomarkers related to BMI that could be examined as candidate disease markers in future etiologic studies. To this end, we measured 317 metabolites in blood samples from 947 participants from three study populations from the United States and China, and examined their relation to BMI.

2 Subjects and methods

2.1 Study participants and data collection

Our study included participants from three study populations: The Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial, the Navy Colon Adenoma Study (Navy), and the Shanghai Physical Activity Study (Shanghai). The PLCO study was a multicenter trial that recruited participants between 1993 and 2001 and randomly assigned them to cancer screening or no screening (Prorok et al. 2000). Eligible participants were 55–74 years of age, with no prior diagnosis of prostate, lung, colorectal, or ovarian cancer. Participants in our analysis had been specifically selected from PLCO for a nested case–control study of colorectal cancer. In this study, there were 254 cases and 255 matched controls who completed a self-administered demographics and health habits questionnaire at baseline, reported their height and weight, and donated a baseline blood sample. Individuals were excluded if they had a rare cancer during follow-up, had self-reported Crohn’s disease, ulcerative colitis, familial polyposis, Gardner’s syndrome, or colorectal polyps as determined from the baseline questionnaire. All colorectal cancer cases in this nested case–control study were diagnosed at least 6 months after baseline, with the average diagnosis occurring 8 years after baseline. Controls were matched to cases on age, sex, year of randomization, and season of blood draw, and were alive and cancer free at the time that the matched case was diagnosed. Of these participants, height and weight data were available for 254 cases and 251 controls.

The Navy study was a case–control study of colorectal adenomas conducted at the National Naval Medical Center (Bethesda, MD) (Sinha et al. 1999). To be eligible, participants had to reside within 60 miles of the study center, be 18–74 years of age, and have no history of Crohn’s disease, ulcerative colitis, or cancer (except non-melanoma skin cancer). Cases were patients diagnosed with colorectal adenoma by sigmoidoscopy (18 %) or colonoscopy (82 %) between April 1994 and September 1996. Controls were patients confirmed to be polyp-free during sigmoidoscopic screening and they were frequency matched to cases by age (±5 years) and gender. Study participants returned to the clinic to donate blood and, approximately 3 months later, completed a questionnaire inquiring about demographics, health habits, and height and weight during an in-person interview at a home visit. Participation rates were 84 % for cases and 74 % for controls. From the 244 adenoma cases diagnosed within this study, we identified 131 cases with no previous history of rectal bleeding or adenoma, complete questionnaire data, and serum samples available. Cases were matched to an equal number of controls on age, sex, and smoking status. Of these participants, 129 cases and 129 controls had reported their height and weight.

The Shanghai study includes men and women selected at random from the Shanghai Women’s Health Study (Zheng et al. 2005) and the Shanghai Men’s Health Study (Jurj et al. 2007) for multiple measurements of physical activity by accelerometer over a study year. These two parent studies are population based prospective cohort studies from 8 communities of Shanghai, China, with recruitment occurring between 1997 and 2006. To be eligible, women had to be 40–70 years of age and men had to be 40–75 years of age at baseline. Our analysis included 106 women and 78 men enrolled in the first wave of the Shanghai physical activity study who donated blood at the end of the study year and had valid accelerometry measurements.

All participants provided informed consent. Institutional review boards at the NCI and each participating screening center (PLCO) or institute (Navy—National Naval Medical Center; Shanghai—Shanghai Cancer Institute and Vanderbilt University) approved the studies.

2.2 BMI assessment

In the PLCO and Navy studies, participants were asked to report their current height and weight by questionnaire at baseline; these data were then used to calculate BMIs. In the Shanghai study, height and weight were measured twice during an in person interview according to a standard protocol by trained interviewers who were retired medical professionals.

2.3 Blood samples and batching

Blood samples from PLCO were collected from non-fasted participants at the individual screening centers. A serum tube (red top with no additive) was allowed to clot at room temperature for one hour, centrifuged to obtain serum (1,200×g for 15 min), aliquoted into cryovials, and frozen at −70 °C within two hours of blood draw. Once per month, serum samples were shipped from screening centers on dry ice to a central biorepository in Frederick, Maryland, and stored at −70 °C.

Blood samples from Navy were collected from participants who were not required to fast. Time since last meal data was collected concurrently with the blood and approximately 58 people (22 %) had fasted, as defined as six or more hours since consuming any food or drink other than water. A serum tube (tiger-top serum separator tube) was allowed to clot at room temperature for 30 min, sent on ice to Biotech Research (Rockville, MD), centrifuged to obtain serum, and frozen at −80 °C within two hours of blood draw.

Blood samples from Shanghai were collected from participants asked to fast. Time since last meal data was collected and 151 of 184 (84 %) had fasted prior to blood draw. An EDTA plasma vacutainer tube was sent from participants’ homes on ice to the Shanghai Cancer Institute. The tube was centrifuged and frozen at −70 °C within two hours of blood draw. After a large number of samples had been collected, tubes were sent on dry ice to a central biorepository in Frederick, Maryland, and stored at −70 °C.

For each study, samples were grouped into batches of 30, corresponding to the number of samples run per day. For each sample, the batch and position within a batch were randomly assigned except that, for the PLCO and Navy studies, cases and their matched controls were placed next to each other in the same batch.

2.4 Metabolite assessment

Metabolites were measured by Metabolon Inc. whose platform and process have been described previously (Evans et al. 2009). They used a single non-targeted extraction with methanol followed by protein precipitation to recover a diverse set of metabolites. Samples were then analyzed using ultra high performance liquid-phase chromatography and gas chromatography coupled with tandem mass spectrometry and mass spectrometry. The three studies were run in sequence on the same equipment. The mass spectra peaks were compared to a chemical reference library generated from 2,500 standards to identify individual metabolites. There were 317 metabolites of known identity measured at detectable levels in all three studies and these constitute the focus of our current investigation. The metabolites were grouped into 8 chemical classes (amino acids, carbohydrates, cofactors and vitamins, energy metabolites, lipids, nucleotide metabolites, peptides, and xenobiotics) based upon the classifications of the Kyoto Encyclopedia of Genes and Genomes (Ogata et al. 1999). Amino acids here include standard amino acids (the 20 amino acids that are protein precursors and are directly encoded by the universal genetic code) as well as non-standard amino acids (amino acids that are not incorporated into proteins or encoded by the universal genetic code).

We have previously reported on the level of reliability for the metabolomics platform used in this analysis (Sampson et al. 2013). The reliability was high, with technical intraclass correlation coefficients greater than 0.8 for more than 50 % of the metabolites. We also calculated intra-assay coefficients of variation, averaged over all individuals, for each of the 317 metabolites in PLCO, Shanghai, and Navy. Each batch of 30 samples included blinded quality control samples—2 samples per batch for the PLCO and Shanghai studies, and 3 samples per batch for the Navy study. The median coefficients of variation over the 317 metabolites were 0.10 (IQR: 0.04, 0.21) for PLCO, 0.21 (IQR: 0.14, 0.33) for Navy, and 0.14 (IQR: 0.09, 0.21) for Shanghai.

2.5 Statistical analysis

To account for variability by run day, we standardized the non-missing (i.e. detectable) values as a proportion of the median value observed that day. Thus, the median value for each metabolite for each run day was set to 1, and metabolite values twice that of the median for that day were set to 2, and so on. Within each study, we then transformed metabolite values to their natural logarithm to normalize distributions and centered them (mean = 0, standard deviation = 1). For each metabolite, we assumed that any missing values were values below the limit of detection and imputed the minimum of the non-missing values. Over the 317 metabolites, the median level of “missingness” before imputation was 1 %.

We examined study-specific associations between metabolite levels and BMI as a continuous variable using linear regression, adjusted for age, gender, and smoking status (never, former, current). PLCO models were additionally adjusted for study center and case status and Navy models were also adjusted for case status. We then used DerSimonian and Laird random effects models to conduct a combined meta-analysis of the three studies (DerSimonian and Laird 1986). Effect sizes for each study and in the meta-analysis indicate the increase in BMI units per one standard deviation increase in the metabolite level (on the log scale). Pearson correlations, adjusted for the same covariates, were also estimated in each study. For meta-analysis of study-specific Pearson correlations, we used Fisher’s r to Z transformation and estimated the standard error as the square root of the sampling variance (Field 2001). Meta-analysis results were then back-transformed for presentation. The threshold for statistical significance was Bonferroni adjusted for 317 statistical tests, i.e. alpha = 0.05/317 = 0.000158.

We assessed consistency of the statistically significant associations across studies using the Q statistic (Takkouche et al. 1999) and by examining consistency of study-specific effect estimates, particularly whether they were all within the bounds of the meta-analysis 95 % confidence interval. To assess replication of findings, we examined whether P-values were consistently low—less than 0.05—across all 3 studies and whether effects were in the same direction.

To assess whether associations were independent, we evaluated the pairwise correlations between all metabolites significantly associated with BMI. Metabolites with pairwise correlations greater than 0.5 were considered to be highly correlated and to have possible redundancy. In supplementary analyses, we also used forward stepwise regression to identify metabolites independently associated with BMI. Metabolites were entered or removed from the model based on the meta-analysis P-value, with the threshold for entry and/or exit into the model set at P = 0.05. This model helps to identify a parsimonious set of BMI-related metabolites that could be carried forward to targeted analyses.

In a sensitivity analysis, we examined whether there was heterogeneity in metabolite-BMI associations according to fasting status in the Navy and Shanghai studies. We also examined correlations between fasting status and BMI to determine whether there is potential confounding by fasting status. In another sensitivity analysis, we excluded participants who developed colorectal cancer during follow-up (PLCO) and prevalent adenoma cases (Navy), reran models, and compared results to the main analysis.

All analyses were performed with SAS software version 9.1.3 (SAS Institute, Cary, NC) and the R statistical language version 2.14.0.

3 Results

The participant characteristics of our three study populations are shown in Table 1. Overall, participants from PLCO were older than those of the other two studies, while participants from the Navy study were more likely to be male, and participants from the Shanghai study had lower BMIs and the men were more likely to currently smoke than men from other studies.

Table 1 Baseline characteristics of study participants

We identified 37 metabolites associated with BMI at the Bonferroni adjusted significance level (Table 2). The distribution of the BMI associated metabolites according to chemical class was as follows: 19 amino acids, three carbohydrates, two peptides, 12 lipids, one xenobiotic, but no cofactors and vitamins, energy metabolites, or nucleotide metabolites. The specific metabolites and their effect sizes and P-values are shown in Table 3 and Fig. 1. The largest effect size and the most significant result—P = 2.51 × 10−13—were observed for glutamate. For glutamate, each 1 standard deviation increase was associated with an increase of 1.39 units of BMI. For a person who is 1.7 meters in height, this would be an increase of 4.1 kg in body weight. At P < 0.05 level, there were 110 metabolites associated with BMI overall. Complete results for all 317 metabolites are shown in Table I of the online data supplement.

Table 2 Number of metabolites detected by chemical class, and the number with statistically significant associations with BMI
Table 3 Association of metabolites with BMI in the combined random effects meta-analysis and by specific study
Fig. 1
figure 1

Effect sizes and 95 percent confidence intervals for combined associations between metabolite levels and body mass index. Results are based on a population of 947 participants (505 from PLCO, 258 from Navy, and 184 from Shanghai). Within each specific study, the association between metabolite levels and BMI was modeled using linear regression, adjusted for age, gender, current smoking status. PLCO models were additionally adjusted for study center and case status and Navy models were also adjusted for case status. The combined model was done using random effects meta-analysis. Only the metabolites that met the Bonferroni corrected threshold of statistical significance in the combined models, i.e. alpha = 0.05/317 = 0.000158, are shown here. The overall effect size (ES)—the black square—indicates the change in units BMI (kg/m2) per one standard deviation increase in metabolite level (log scale). The horizontal line indicates the 95 percent confidence interval of the estimate. The dotted line at zero overall ES is the line of no effect

The associations had minimal heterogeneity by study. All tests for heterogeneity were non-significant (P heterogeneity > 0.05). Effect sizes were in the same direction and of similar magnitude between studies. Out of 37 metabolites, only two (3-hydroxyisobutyrate and palmitoyl sphingomyelin) had study-specific estimates that differed substantially, i.e. by more than 50 %, from the meta-analysis estimate. Of the 37 metabolites, 25 had a P-value <0.05 for the metabolite-BMI association in all three studies (by chance alone, no metabolites would be expected to replicate at this level).

Of the 37 metabolites associated with BMI, there were 18 that have never, to our knowledge, been previously linked with BMI in the literature. Seven of these 18 metabolites were highly correlated (r ≥ 0.5) with the known BMI-associated metabolites of valine, tyrosine, phenylalanine, leucine, or isoleucine (Fig. 2). These seven metabolites were 3-4-hydroxyphenyl-lactate, gamma-glutamyltyrosine, 3-methyl-2-oxobutyrate, gamma-glutamylisoleucine, 3-methyl-2-oxovalerate, 4-methyl-2-oxopentanoate, and 3-hydroxyisobutyrate. Adjusting for the known BMI-related metabolites in sensitivity analyses eliminated the associations for these seven metabolites, suggesting that they provide potentially redundant information.

Fig. 2
figure 2

Pairwise correlations in the PLCO dataset for the 37 metabolites associated with BMI. A white square represents a pairwise correlation of less than 0.50, a grey square represents a correlation 0.50–0.69, and a black square indicates a correlation of 0.70 or greater. The red dashed line separates metabolites that had been previously reported to be associated with BMI (bottom and/or right) vs. metabolites that had never before been reported to be associated with BMI (Color figure online)

The 11 remaining metabolites with novel BMI associations were not strongly correlated with known BMI-related metabolites and thus likely provide new information (Fig. 2). Three of these were intercorrelated lipids (1-oleoylglycerophosphocholine, 1-eicosadienoylglycerophosphocholine, 2-linoleoylglycerophosphocholine). The remaining 8 metabolites (butyrylcarnitine, 2-hydroxybutyrate, 7-alpha-hydroxy-3-oxo-4-cholestenoate, alpha-hydroxyisovalerate, benzoate, n-acetlyglycine, palmitoyl sphingomyelin, and histidine) were not strongly correlated with one another.

In addition to the novel associations, there were 19 additional metabolite-BMI associations that had been previously reported upon, including branched chain amino acids like valine, aromatic amino acids like tyrosine, markers of glucose control like mannose and 1-5-anhydroglucitol, and markers of lipid metabolism like glycerol and lathosterol.

Over the entire set of 317 metabolites, no associations were heterogeneous by study after Bonferroni correction (the lowest P-values were P = 0.0004 for serotonin 5-HT, P = 0.0017 for dimethylarginine-SDMA-ADMA, and P = 0.0034 for adrenate-22-4n6). There were also no associations with statistically significant heterogeneity by age or gender after Bonferroni correction, though some results for gender were of borderline significance. We included gender-specific results in Tables II and III of the online data supplement. The lowest p-values for heterogeneity by gender were P = 0.0002 for threitol, P = 0.0005 for histidine, and P = 0.0005 for benzoate.

In the stepwise regression model, 19 metabolites were associated with BMI at the P < 0.05 level (Table IV of the online data supplement). In this model, the associations are mutually adjusted and thus results are statistically independent from one another.

In analyses of heterogeneity by fasting status, there were no interactions after Bonferroni correction. Forty five metabolites met the nominal level of statistical significance (P < 0.05) but differences in effect size were modest, generally differing by 0.05–0.07 (or −0.05 to −0.07) for fasted vs. non-fasted samples. Regarding the specific interactions observed, none of the 37 metabolite-BMI associations were weaker in fasting only samples; six were slightly stronger (for the metabolites of gamma glutamyltyrosine, lathosterol, andro steroid monosulfate 2, isovalerylcarnitine, leucine, and carnitine). For metabolites not associated with BMI, only a few could potentially have reached statistical significance in fasting only samples based on the interactions observed (kynurenate, gamma glutamylvaline, gamma glutmylleucine, urate, 1′arachidonoylglycerophosinositol, dihomo-linoleate, and serine). In addition, fasting status was not associated with BMI in either the Navy study (Pearson r = −0.09, P = 0.15) or the Shanghai study (Pearson r = 0.06, P = 0.40), suggesting against major confounding by fasting status.

In analyses that excluded participants who later developed colorectal cancer (PLCO study) or had a prevalent adenoma (Navy study), results remained materially the same as in the main analysis (see Table V of the online data supplement). For the 37 BMI-associated metabolites, changes in effect size were almost uniformly less than 10 %. Also, we examined heterogeneity by colorectal adenoma status (Navy study), and future colorectal cancer status (PLCO study), but found no heterogeneity at the Bonferroni corrected level.

4 Discussion

Through meta-analysis of metabolomics data from three study populations, we identified 37 metabolites associated with BMI at the Bonferroni corrected level of statistical significance, including 19 lipids, 12 amino acids, and 6 other metabolites. The associations were strong, had highly significant P-values, and replicated across the studies at the P < 0.05 level. Of the 37 metabolites, 18 metabolites, to our knowledge, had never before been associated with BMI. Of these 18 metabolites, 11 had no strong correlations with known BMI-related metabolites, and thus are considered to be “new” BMI-associated metabolites. We also replicated previously observed associations for 19 metabolites. Overall, the strongest associations were observed for amino acids.

Six of the 11 “new” BMI-associated metabolites were lipids. Prior research suggests that fatty acid oxidation is higher among those with insulin-resistance (Newgard 2012), and our results suggest a similar result for high BMI. Butyrylcarnitine and 2-hydroxybutyrate (an amino acid), for example, are both known markers of excessive fatty acid oxidation (van Maldegem et al. 2006; Gall et al. 2010) and both were positively associated with BMI. The other five lipids (7-HOCA, 1-oleoylglycerophosphocholine, 1-eicosadienoylglycerophosphocholine, 2-linoleoylglycerophosphocholine, and palmitoyl sphingomyelin) have not been well-characterized with respect to health and could be interesting targets for future research.

Lipids also figure prominently among the replicated associations, comprising 6 of the 19 metabolites with a previously observed BMI association. Of particular interest were associations for glycerol and lathosterol because prior studies had been small and replication was needed (Puhakainen et al. 1992; Paramsothy et al. 2011). Glycerol is a three-carbon molecule that forms the backbone of all triglycerides and blood levels of glycerol reflect the rate of fatty acid breakdown (Puhakainen et al. 1992; Venables et al. 2008). Lathosterol is a cholesterol precursor and blood levels reflect the rate of cholesterol synthesis (Pihlajamaki et al. 2004; Miettinen et al. 1990). Interestingly, lathosterol was much more strongly associated with BMI than was cholesterol, possibly indicating that lathosterol reflects an aspect of cholesterol metabolism that is especially sensitive to an elevated BMI.

Four of the 11 “new” BMI-associated metabolites were amino acids: 2-hydroxybutyrate (discussed above), alpha-hydroxyisovalerate, n-acetylglycine, and histidine. Alpha-hydroxyisovalerate has been implicated in several metabolic disorders, including phenylketonuria. N-acetylglycine is the acetylation product of glycine, but the significance of this metabolite for health is otherwise unknown. Histidine is a proteinogenic amino acid and a precursor of the neurotransmitter histamine. Our study is the first to identify a statistically significant association between histidine and BMI although three prior studies did examine this association (Cheng et al. 2012; Newgard et al. 2009; Felig et al. 1969).

Our analysis also replicated associations already established between amino acids and BMI, including associations for valine, isoleucine, leucine, tyrosine, phenylalanine, glutamate, kynurenine, and glycine (Gaudet et al. 2012; Newgard et al. 2009; Felig et al. 1969; Cheng et al. 2012). Although a link between amino acids and BMI was first noted four decades ago in a clinical study by Felig et al. (1969), this link is being reexamined due to their recent replication in metabolic profiling studies (Gaudet et al. 2012; Newgard et al. 2009; Cheng et al. 2012; Floegel et al. 2012), and the additional finding that these amino acids are associated with future diabetes risk, even after controlling for BMI (Wang et al. 2011; Floegel et al. 2012). Possibly, these amino acids are biologically important for diabetes risk, above and beyond their association with BMI.

Why amino acid levels are high among those with high BMIs is not well-understood. One theory is that insulin-resistance (which tends to co-occur with a high BMI) causes excess protein breakdown in skeletal muscle and amino acid breakdown products are released into the blood (Felig et al. 1969; Forlani et al. 1984; Tremblay et al. 2007). Alternatively, the high fat free mass among those with high BMIs may be related to higher protein turnover, which may, in turn, cause amino acid breakdown products to be released into the blood. Finally, high amino acid levels may themselves affect eating behavior through their neuroactive properties. The amino acids associated with BMI include two neurotransmitters (glutamate, and glycine), three neurotransmitter precursors (histidine, tyrosine, and phenylalanine), and kynurenine, which has neuroactive properties. Several of these have been implicated in appetite regulation (Haas et al. 2008; Wu et al. 2012; Johnson and Kenny 2010). A limitation to this theory is that, due to interference by the blood–brain barrier, it is not certain that amino acid blood levels will correlate with their level of utilization in the brain. For one amino acid neurotransmitter—tyrosine—there is at least preliminary evidence indicating a link between blood levels and brain activity (Montgomery et al. 2003).

The final “new” BMI-associated metabolite was benzoate, a common preservative added to food and personal care products. In exploratory analyses, benzoate levels were modestly higher among women and highly correlated (r > 0.5) with levels of metabolites found in personal care products, including 2-ethylhexanoate, heptanoate, and pelargonate. Possibly, the benzoate-BMI association is driven by behavioral patterns related to personal care product use. Alternately, there may be greater sequestration of these metabolites among people of a higher BMI.

For carbohydrates, we replicated known associations between BMI and mannose, lactate, and 1-5-anhydroglucitol. Mannose is a C-2 epimer of glucose and blood mannose levels correlate highly with those of glucose (Sone et al. 2003; Pitkanen et al. 1999), but are less sensitive to food intake and are more tightly linked with diabetes (Sone et al. 2003). Lactate is a product of glycolysis when oxygen supply is limited, a state that may occur more frequently among those with high BMIs due to poorer circulation (Trayhurn et al. 2008). 1-5-anhydroglucitol is an established biomarker for short-term glucose control (McGill et al. 2004).

A major strength of our study was the large number of metabolites assayed, which allowed us to discover new associations, and to confirm associations from small studies in the literature that might otherwise have been neglected. Additional strengths were its comparatively large sample size, and the replication of associations across three ethnically diverse study populations. This means that our findings are unlikely to be due to chance, and that they should have a reasonably high level of generalizability in different studies and ethnic groups.

There were several important limitations. Our study is cross-sectional, which complicates efforts to determine the causal nature of associations. BMI does not discriminate between fat and lean mass, and does not measure body fat distribution. However, BMI, body fat percent, and trunk fat are typically highly correlated and, in at least one study (Sun et al. 2010), they have been similarly associated with obesity-related biomarkers and metabolic syndrome. BMI also was determined using self-reported weight and height in our largest two studies (PLCO and Navy) but correlations between measured and self-reported weight and height typically exceed 0.9 (Willett 2012) and so errors were likely modest. The metabolite levels in our study were based on blood samples from a single point in time and, if metabolite levels vary over time, the associations could be attenuated relative to the true associations (Sampson et al. 2013). Our blood samples also differed between studies in type (serum for PLCO and Navy, plasma for Shanghai) and fasting status (non-fasting for PLCO and Navy, primarily fasting for Shanghai) which would add heterogeneity and could result in some false negatives. On the other hand, the findings that we did report are likely to be robust across a variety of sample collection designs typical for epidemiologic studies. Finally, our metabolomics analysis was limited to metabolites that could be detected by the platforms used and our data were measured in terms of relative concentrations rather than actual concentrations.

In summary, we used metabolomics to identify a large and diverse set of metabolites associated with BMI. Our results provide a critical baseline for the establishment of the “metabolome” of BMI. They also suggest a large number of candidate markers that can be explored by future prospective studies as potential mediators of the associations between obesity and disease risk.