Introduction

Metabolomics is the agnostic or comprehensive assessment of low molecular-weight endogenous and exogenous metabolites in a biological sample. Knowledge of these compounds can be used to elucidate the molecular-level responses to a genetic alteration, or the effect of disease. They can also be used to monitor responses to various stimuli such as exposure to environmental and lifestyle factors or administration of a drug [1,2,3]. Metabolite abundance is regulated by various host processes such as protein-enzymatic reactions; thus, metabolomics sits at the end of the “omics cascade” within the central dogma of biology [3]. In addition to host processing, metabolites can be co-metabolized by the microbiome; a pool of these microbial metabolites is secondary metabolites, of which most are currently uncharacterized. As metabolites are the biochemical intermediates that carry out biological functions within the body, the metabolome closely reflects the phenotype or physiological state, compared to analysis of the genome, transcriptome, and proteome [3]. Recent developments in mass spectrometry have allowed for high-resolution measurement of tens of thousands of features in small volumes of biological samples, which enables an untargeted strategy to simultaneously analyze exogenous chemicals, their metabolites, and associated perturbations to the endogenous metabolome, allowing for an exposome-level evaluation [4].

Air pollutants have been associated with various health outcomes, such as cardiovascular diseases, respiratory diseases, cancer, central nervous system disorders, and adverse pregnancy outcomes [5,6,7,8,9]. However, the mechanisms that underlie these observed associations are not conclusive [10, 11]. Numerous studies have extensively investigated selected biomarkers of oxidative stress and inflammation, with a traditional, hypothesis-driven approach [12]. This approach, often focusing on a relatively small number of metabolites, cannot capture the full picture of biological responses to air pollution [11, 13]. To address this challenge, untargeted metabolomics has been incorporated into many recent mechanistic studies of air pollution. A recent book chapter reviewed the application of metabolic profiling in human, animal, and in vitro studies of air pollution, and discussed metabolites or pathways related to oxidative stress responses [13]; however, this review was based on a limited number of human studies (n = 8), and focused on oxidative responses only.

The objective of this review is to provide a comprehensive review of the emerging literature on air pollution and untargeted metabolomics (agnostic/global detection and relative quantitation of all small molecules in a sample), including assessments of study design, metabolites and pathways identified, discussion of methodological challenges, and to provide recommendations for future studies.

Study Populations and Study Designs

We conducted a search on human studies in PubMed using a combination of MeSH terms and keywords: (metabolomics[MeSH Terms] OR metabolome[MeSH Terms] OR metabonomics) AND (air pollution[MeSH Terms] OR particulate matter[MeSH Terms] OR nitrogen oxides[MeSH Terms] OR air pollution OR air pollutants OR polycyclic aromatic hydrocarbons). The last search was conducted on June 9, 2020. After screening the abstract and full text of 315 articles, we included a total of 23 studies on air pollution and untargeted metabolomics published between 2012 and 2019 (Fig. 1, Table 1) [10, 11, 14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34]. More details on these studies are in Table S1. Among these, 19 studies focused on adults aged 18 or older [10, 11, 14,15,16,17,18,19,20,21,22, 25,26,27,28, 31,32,33,34], one study on pregnant women [23], two studies on both elderly adults and children [29, 30], and one study on children and adolescents [24].

Fig. 1
figure 1

Flow diagram of literature search and selection

Table 1 Study design, air pollution exposures, and metabolomics analysis in the 23 reviewed studies

These 23 studies include 11 crossover designs [10, 11, 14,15,16,17,18,19,20,21,22], 10 cross-sectional studies [23,24,25,26,27,28,29,30,31,32], one case-control study, and one study using a combination of crossover and cross-sectional designs [34].

All participants in the crossover studies went through two to five exposure scenarios with varying expected air pollution levels. In these scenarios, participants were requested to conduct activities (e.g., walking, driving), or to use real or sham air purifiers. The order of purposefully designed exposure sessions was typically randomized, and any two exposure sessions were at least 7 days apart to avoid overlapping impacts of the exposures. In the cross-sectional or case-control studies, the selected participants were considered to capture the expected variation in air pollutant exposures of the study population. For example, residents living in neighborhoods near (< 500 m) or far (> 1000 m) from a highway were recruited in the Community Assessment of Freeway Exposure and Health (CAFEH) study in Boston, USA [25].

All 11 crossover studies and the one study combining crossover and cross-sectional designs investigated short-term effects of air pollution exposures (i.e., hours or days) on metabolomic perturbations. As participants served as their own comparisons in crossover designs, the changes in metabolite levels between exposure scenarios were investigated. All studies measured metabolites at the end of exposure scenarios. Five studies also measured metabolites before each scenario [10, 11, 14, 15, 18]; that is, these five studies investigated the changes in metabolites between exposure scenarios, as well as the changes within each scenario.

Among the 10 cross-sectional studies and the one case-control study, 10 investigated sub-chronic (i.e., multiple months) or long-term effects (i.e., ≥ 1 year). Nine collected one biological sample for each participant, and thus investigated the between-participant variations and related variations in metabolic profiles. The other two studies on sub-chronic or long-term effects had repeated measures of metabolites to also account for within-participant changes in metabolites over time [27, 32]. In the 10 sub-chronic or long-term studies, the timing of biological sample collection with respect to the exposure windows varied: five studies collected one or multiple samples during the exposure windows [25, 27, 31,32,33], one study collected one sample after the exposure window [23], and four studies involving exposure biomarkers collected the samples for measuring metabolites and internal exposures at the same time [21, 26, 29].

Among the 23 studies reviewed in this paper, eight also investigated metabolites related to health conditions such as adult-onset asthma, ischemic heart disease (IHD) [10, 28, 33], or had measured health markers of early health effects such as lung function, blood pressure, and biomarkers of oxidative stress or inflammation [18, 21, 22, 27, 29] (Table 2). These studies used case-control or cross-sectional study designs. All but one of the eight studies collected biological samples after or at the same time of the assessment of health outcomes. The other one study selected cases and controls from two cohorts: (1) in Italy, biological samples were collected before the identification of incident cardio-cerebrovascular disease (CCVD); and (2) in Switzerland, biological samples were collected 7–10 years after identifying cases of adult-onset asthma (AOA) [33].

Table 2 Summary of studies investigating the metabolites or metabolic pathways related to both air pollution exposures and health outcomes

To find the metabolic pathways linking air pollution to health outcomes, these studies first investigated the metabolites or pathways associated with air pollution, and those associated with health outcomes, separately. Then, they investigated whether there was overlap or correlation in the metabolites or pathways related to air pollution and those related to health outcomes. Three of the eight studies explicitly referred to this method of finding overlapping metabolites as a “meet-in-the-middle” approach [21, 29, 33].

Air Pollution Exposure Assessment

In 13 studies on short-term effects, 12 studies used portable or stationary monitors [10, 11, 14,15,16,17,18,19, 21, 22, 28, 34], and one study used human exposure chamber to challenge study participants with ozone (O3) or filtered air [20]. Among the 10 studies on sub-chronic or long-term effects, five studies used dispersion, land use regression (LUR), or kriging models to estimate air pollutant exposure levels [23, 25, 27, 29, 33]; four studies measured exposure biomarkers of ambient polycyclic aromatic hydrocarbons (PAHs) or vanadium (V) [24, 26, 29, 30]; and one study used portable or stationary monitors [31].

Air pollution mixtures from different sources differ in the chemical composition and size distribution of particulate matter (PM), which could influence the decision on which pollutants or components to measure in these studies. All but one of the 14 studies on ambient air pollution or traffic-related air pollution (TRAP) measured PM with diameter ≤ 2.5 μm (PM2.5) mass concentration [10, 11, 14, 15, 18, 19, 21,22,23, 25, 27, 28, 32, 33]. Five of the eight TRAP studies also measured PM components related to combustion, such as elemental carbon (EC), organic carbon (OC), and particle-bound polycyclic aromatic hydrocarbons (pb-PAH) [10, 11, 15, 18, 32]. Among the four studies on industrial emissions from petrochemical or coking plants, external or internal exposures to PAHs and/or heavy metals were investigated among residents [21, 26, 29, 30]. Pollutants measured in occupational settings varied widely depending on the anticipated exposures [16, 17, 31, 34].

Studies in China (Beijing and Shanghai) [14, 19, 21, 22, 28] tended to show higher PM2.5 mass concentrations than the studies performed in Europe (UK, Switzerland, and Italy) [27, 33] or the USA [32]. For the four studies on industrial emissions, the urinary 1-hydroxypyrene (1-OHP) level in the elderly residing near a coking plant in Shanxi, China (1.42 μg/g creatinine for non-smokers and 3.13 μg/g creatinine for smokers), was higher than the levels in elderly adults residing near the largest petrochemical complex in Taiwan (0.42 μg/g creatinine) [21, 26, 29, 30]. In addition, the 1-OHP levels in the exposed elderly were higher than the levels in exposed children (0.25 μg/g creatinine) in Taiwan [29].

Untargeted Metabolomics Analysis

From the 23 studies we examined, 15 studies analyzed the serum or plasma metabolome [10, 11, 14,15,16,17,18,19,20,21, 23, 25,26,27, 33], six analyzed the urinary metabolome [21, 22, 28,29,30,31], one analyzed both serum and saliva [32], and one analyzed exhaled breath condensate [34].

For analytical platforms, 15 studies used liquid chromatography mass spectrometry (LC-MS) to measure metabolites [10, 11, 15, 16, 18, 21,22,23,24,25, 28, 30, 32, 33, 35], four studies used both gas chromatography (GC)-MS and LC-MS [14, 19, 20, 27], three studies used nuclear magnetic resonance (NMR) spectroscopy [26, 31, 34], and one study used two-dimensional GC × GC-time-of-flight (ToF)-MS [29]. The combination of LC and GC has been shown to achieve complementary coverage [36]. LC-MS analyses can be heterogenous between research labs due to the many different configurations of columns (e.g., hydrophilic interaction liquid chromatography (HILIC), reversed-phase liquid chromatography (RPLC)), the type of ionization (e.g., electrospray ionization (ESI), atmospheric pressure chemical ionization (APCI)), the mode of ionization (i.e., positive and/or negative), and the type of mass analyzer (e.g., time-of-flight (ToF), quadrupole, ion trap).

All studies preprocessed their metabolomics data to align, filter, and deconvolute the data using a wide range of software. After obtaining the deconvoluted peak area/heights for each metabolic feature, studies narrowed down the features for further analysis, by selecting those that were detected in at least 10–80% of samples. Such a wide range of cut-off points for feature selection could lead to variation between data reported from different studies; keeping features detected in > 10% samples might be more likely to find more significant features than a study with more strict inclusion criteria.

As most of the studies (20 out of 23) used MS, in this review, we only discuss the common methods for annotating features in MS studies. All of the studies searched the mass-to-charge ratios (m/z) of each feature against metabolite, and metabolic pathway libraries (e.g., Human Metabolome Database (HMDB) [37], Kyoto Encyclopedia of Genes and Genomes (KEGG) [38], METLIN [39]). Five studies conducted additional validation using LC coupled to tandem mass spectrometry (LC-MS/MS), to fragment the metabolic features and confirm the compound identity by comparing MS/MS spectra to those available in the public libraries [18, 22, 28, 29, 31]. In 12 studies, the spectra were matched against an authentic chemical standard analyzed on the same analytical platform. To do this, the m/z of the parent ion and MS/MS fragments were compared, as well as the retention time to perform the highest level of confidence in metabolite identification [10, 11, 14, 16, 20, 23, 25, 27, 30, 32, 33, 35].

For statistical analysis, supervised partial least squares discriminant analysis (PLS-DA) or orthogonal partial least squares discriminant analysis (OPLS-DA) was used in 11 studies to determine which metabolites drive the separation of high- or low-exposure groups [19, 21, 23,24,25,26, 28,29,30, 34, 35]. The key features that drive the separation of the exposure groups were identified by variable importance in projection (VIP) scores (cut-offs, > 1, > 1.5, or > 2). Network analysis and principal component analysis were also used to identify groups of metabolites that were correlated with each other [14, 17]. These methods are dimension reduction tools for high-dimensional data. To obtain effect direction, fold changes, t test, analysis of variance (ANOVA), analysis of covariance (ANCOVA), or regression models (e.g., mixed effect models) were used in these studies. As a large number of metabolites were investigated in these analyses, multiple comparison correction methods, such as Benjamini-Hochberg false discovery rate adjustment or Bonferroni correction, were used to account for multiple comparisons.

Metabolomics Methods Assessment

We developed an assessment tool to compare methods between studies, based on recommendations from the Metabolomics Standards Initiative, and other related studies and quality assessment tools [40,41,42,43,44,45,46]. We constructed five metrics for scoring the studies (Fig. 2). Higher scores indicate either more complete reporting on the experimental design and methods for metabolomics analysis, or more rigorous exposure assessment. The scores are not final judgments on the scientific validity of a study, but rather an attempt to understand where reporting could be more transparent, so that findings can be robustly compared across heterogeneous studies. This is a novel tool designed for evaluation of metabolomics analysis of air pollution, but it can be adapted to any metabolomics study with human subjects. Most of the studies we reviewed had moderate to high scores (Table 3). Fifteen of the 23 studies did not report how missing values were treated [11, 17, 19, 21, 22, 24,25,26,27,28,29,30,31,32, 34]. Ten studies conducted more rigorous exposure assessment by accounting for time-activity patterns or requested participants to perform standardized activities, and/or had taken into account temporality, by having the exposure windows preceding the metabolite measurements [10, 11, 14,15,16, 18,19,20,21,22,23, 25, 28, 31, 32, 34]. Most of the studies reviewed were found to have high confidence levels for chemical annotation (level 1 or 2).

Fig. 2
figure 2

Scoring process for metabolomics methods assessment (1clear temporality means that the air pollution exposures precede the metabolite measurements)

Table 3 Metabolomics methods assessment for reviewed studies

Perturbations of Metabolites or Pathways Related to Air Pollutant Exposure

While untargeted metabolomic analysis of biological samples can detect thousands of metabolic features, only a small number of these features correlate with air pollution exposure (ranging from three to 121 across the reviewed studies), and could be positively identified (MS/MS spectra matched to authentic chemical standards or spectra in the libraries, i.e., level 1/2, Fig. 2). The detected metabolites were mostly endogenous metabolites (i.e., lipids, amino acids, carbohydrates, nucleotides, steroids, cofactors, and vitamins). Few studies identified xenobiotics, such as PAH metabolites (i.e., catechol, 3-(2-hydroxyphenyl)propanoate, naphthylamine, nicotine metabolites (i.e., cotinine), and benzoate [11, 25, 27]. These xenobiotics were identified through an untargeted process. Pathway analyses identified 1–50 enriched metabolic pathways.

All 23 studies detected perturbations in lipid levels or pathways related to lipid metabolism, primarily as responses to oxidative stress and/or inflammation. Air pollutant exposures were associated with higher levels of reactive oxygen species (ROS) or free radicals that cause cellular membranes to break into free fatty acids. Polyunsaturated fatty acids (PUFA), such as linoleic acid and arachidonic acid, can be oxidized, leading to increased pro-inflammatory metabolites, such as leukotrienes and prostaglandins [11, 14, 15, 21, 23, 31, 32]. A lipid peroxidation biomarker, 4-hydroxynonenal (4-HNE), was positively associated with UFP in the Boston CAFEH study [25]. Linolenic acid, a downstream metabolite of PUFA with anti-inflammatory effects (inhibits the biosynthesis of leukotriene B4), showed negative association with TRAP in Boston, Atlanta, and California, in the USA [25, 32].

Perturbations to amino acids, metabolites in purine metabolism, and acylcarnitines that are involved in energy metabolism were also commonly reported (Table 4). Histidine, with anti-inflammatory effects, showed consistent inverse associations with air pollution exposures, including TRAP, PM2.5, and emissions from a petrochemical plant [11, 23, 25, 26, 32]. Uric acid, a powerful antioxidant and end product of purine metabolism, which is measured in urine samples, was negatively associated with air pollution exposures, including PM2.5 and coking plant emissions [21, 28, 30]. The effect directions of other metabolites, including oxidants or antioxidants, were variable between studies (Table 4).

Table 4 Summary of commonly detected metabolites (detected in at least three studies) and effect directions

Three studies reported perturbations in metabolites in steroid metabolic pathways (e.g., glucocorticoid metabolism) [16, 19, 21]. A randomized, double-blind crossover trial in Shanghai, China, implemented functional or sham air purifiers in dorms for 9 days with a 12-day washout period, and observed that stress hormones measured in blood serum (i.e., glucocorticoids (cortisone and cortisol), catecholamine (epinephrine and norepinephrine), and melatonin) were associated with higher PM2.5 mass concentration exposures suggesting a response from the central nervous system by activation of the hypothalamus-pituitary-adrenal axis [19]. In a randomized crossover study, 24 participants were exposed to O3 and filtered air in human chambers during two clinical visits that were at least 2 weeks apart. O3 was significantly associated with elevated cortisol and corticosterone, consistent with studies in rodent models [20]. However, a study on boilermakers at an apprentice welding school in MA, USA, showed that glucocorticoids (cortisol, cortisone, corticosterone) were significantly lower after a 5-h shift in the welding workshop (higher exposure to PM2.5 and PM enriched with heavy metals), compared to before their shift commenced [16].

Current evidence cannot conclude which air pollutant or chemical component of PM has stronger impacts on metabolic perturbations. Some PM components exhibited stronger associations with metabolites within studies, which might suggest exposure contributions from different emission sources, specific to each study area. In the Atlanta Commuters Exposure (ACE) study on TRAP, EC and V showed the largest number of associations with metabolite features (i.e., unconfirmed metabolites), among various PM2.5 components [11]. Specifically, EC and V had 802 and 762 significantly associated features, respectively, while PM2.5 was associated with just 215 significant features [11]. On the other hand, compared to other PM2.5 components, S showed the strongest associations with 5-phosphoribosylamine, and S and V were related to the largest decline in 4-pyridoxic acid, among the elderly residents in Beijing, where coal combustion for heating and industries is a major pollution source [28]. Little overlap in the metabolites that showed associations with PM (i.e., PM2.5, PM10, PM2.5–10, UFP, and BC) and NOx, indicated differential effects of PM and gas pollutants [10].

The perturbations of metabolites or pathways related to air pollution exposure differed by disease status, age, and sex, which potentially provides mechanistic basis for susceptibility of subpopulations. The ACE study observed that most of the TRAP-related metabolites were differently expressed by asthmatic status of the participants [11]. Specifically, decreased arginine and histidine were significantly associated with exposures to EC or PM2.5-V only among asthmatic participants. In addition, in these participants, increased methionine (an essential amino acid promoting ROS) was significantly associated with PM2.5-Colbat. The pathways only significant in asthmatic participants tend to be those related to acute inflammatory responses. A randomized crossover study, where 39 healthy volunteers commuted in the Beijing subway for 4 h with or without masks (3 M respirator), found strong associations between 8-hydroxy-deoxyguanosine (8-OHdG), a biomarker of DNA damage, and size-fractioned PM and cardiovascular indicators in men only. This indicates that men might be more prone to air pollution-related cardiovascular effects than women [22]. In the randomized crossover trial on using real and a sham air purifier (each for 9 days) in university dormitories in Shanghai, the analysis of serum samples showed significant interactions of PM2.5 with sex for five metabolites, among which the effect sizes for hydroxylamine, arginine, tryptophan, and phytosphingosine in men were larger than those in women [19]. The studies on residents near a large petrochemical complex in Taiwan found that in blood and urine samples, different metabolic pathways were significantly dysregulated and the observed changes were age-specific [24, 26, 29].

Eight studies investigated whether air pollution-related metabolites or pathways were also perturbed by health outcomes, including diseases, (e.g., chronic obstructive pulmonary disease, COPD; IHD), as well as health markers (e.g., measures of lung function, blood pressure, biomarkers of oxidative stress, or inflammation). Seven of the eight studies found overlapping metabolites or pathways related to both air pollution exposures and health outcomes [10], and one found a correlation between PM2.5- and COPD-related metabolites among elderly COPD patients and their healthy spouses in Beijing [28]. Table 2 summarizes the associations of air pollution or health outcomes with the following metabolites or pathways: 8-OHdG, prolyl-arginine, vitamin E, metabolites related to purine metabolism, citrate cycle, fatty acid metabolism, glutathione metabolism, linoleate metabolism, glycosphingolipid metabolism, carnitine shuttle, tyrosine metabolism, urea cycle/amino group metabolism, N-Glycan degradation, tryptophan metabolism, phenylalanine metabolism, glycine, serine, and threonine metabolism, alanine, aspartate, and glutamate metabolism.

Conclusions

Over the past decade, the application of untargeted metabolomics to air pollution epidemiology has gained popularity. However, the methodologies used in these studies vary widely for both air pollution exposure assessment and untargeted metabolomics profiling. Most studies investigated ambient PM measured for various size fractions, and the number concentration and chemical composition. Gas pollutants (NO, NOx, CO, O3) were also measured by these studies. A wide range of exposure assessment methods was used, including portable monitors, stationary monitors, and spatial models. For analysis of the metabolome, most studies used LC-MS as their primary analytical platform. All the studies report the detection of thousands of features, but only a few metabolites can be confirmed with high confidence level. The processes of annotation, statistical analysis, and pathway enrichment analysis also varied widely.

A wide range of metabolites was associated with air pollutant exposures, most of which were endogenous, and a few of which were xenobiotics. Most detected metabolites or pathways were related to oxidative stress or inflammation responses, and perturbations in stress hormones were reported in three studies [16, 19, 21]. Pro-inflammatory metabolites (e.g., leukotrienes) or related metabolism was upregulated [11, 14, 15, 21, 23, 31, 32] and metabolites with anti-inflammation effects (e.g., histidine, linolenic acid) tended to be downregulated under elevated air pollution exposures [11, 23, 25, 26, 32]. Although air pollutants were consistently reported to disrupt antioxidant-oxidant balance, mixed effect directions were reported for numerous oxidants or antioxidants detected in the metabolomics analyses.

Existing evidence cannot conclude which air pollutants or chemical components are most responsible for adverse health effects. The physiochemical characteristics of air pollutants differed across study locations and were likely attributable to heterogeneity in findings. In one study reporting two cross-sectional studies in London and Barcelona with similar metabolomics analysis, the authors found no overlap in either the confirmed metabolites or pathways related to air pollution between the two cities [10]. The metabolic perturbations by air pollution could differ by disease status, age, and sex, but did not seem to differ by racial and ethnic groups [23], which could provide evidence on mechanisms for susceptible subpopulations.

A limited number of studies involving health outcomes (including diseases or health markers) reported overlapping or correlated metabolites or pathways between air pollution exposures and health outcomes. However, all but one of these studies collected biological samples after or at the same time of health outcome assessment. It is challenging to draw conclusions on whether these overlapping metabolites or pathways mediate the impact of air pollution on health outcomes, or whether air pollution exacerbates health outcomes through these shared metabolisms. Future studies prospectively evaluating exposures to airborne contaminants, metabolic changes, and health outcomes are needed.

Recommendations

The studies documented in this review conducted extensive exposure assessment, and most extensively documented the details of metabolomic analyses. However, additional studies or further improvement can be considered in the following aspects:

  • Identifying air pollutants or PM chemical components that have the largest effect on the metabolome;

  • Characterizing the joint effects of complex air pollutant mixtures;

  • Investigating various potential effect modifiers, such as sex, age, or disease status, to provide a mechanistic basis for susceptibility to a wide range of health effects;

  • Expanding sample sizes by leveraging new technologies in personal exposure assessment [47];

  • Adopting a more standardized reporting methodology for sample collection and data preprocessing, facilitating comparisons across labs and between studies; and,

  • Longitudinal design that prospectively evaluates air pollution exposures, metabolic changes, and health outcomes.