Keywords

1 Introduction

In 2008, more than 10 years ago, we reviewed epidemiological studies of gene–environment interactions to detect the adverse health effects of environmental chemicals on the next generation [1]. As is well documented, numerous chemical compounds such as polycyclic aromatic hydrocarbons (PHA) in tobacco smoke are activated and detoxified by xenobiotic-metabolizing enzymes [1]. Xenobiotic-metabolizing genes appear to be influenced functionally by maternal smoking during pregnancy, which may be a significant risk factor for low birth weight (LBW) and/or intrauterine growth restriction (IUGR) [1,2,3,4]. The Hokkaido Study, a pioneering work in this field, examined the effects of environmental factors together with a genetic predisposition on the health and development of about 20,000 children across the Hokkaido prefecture, the northern part of Japan, from the prenatal period onward [5, 6].

The last decade has witnessed major innovations in research design and methods, accompanied by recent advances in technologies for statistical and biological analyses and measurement [7,8,9,10]. In particular, genome-wide association studies (GWASs), including epigenome-wide association studies (EWASs), have become mainstream in genome cohort studies using advanced genomics and epigenomics techniques, which has made it possible to better understand the genetic basis of diseases [7]. Furthermore, Mendelian randomization is a fast-growing area that involves the analysis of genetic variants to assess the causal relationships between exposure and outcome [8]. The field of exposure science, termed exposome has emerged, which involves the study of mechanisms by which “non-genetic” exogenous and endogenous exposure influence the risk of disease [9, 10]; linking this field with genomics is expected to enable elucidation of the origins of multiple complex diseases [9]. The methods for evaluating gene–environment interactions have advanced rapidly because of the accumulation of large-scale data, termed big data. Using GWASs, we have obtained data for hundreds of complex traits across a wide range of domains, including common diseases and quantitative traits that are risk factors for diseases, enabling us to better define the relative role of genes and the environment in disease risk [7]. Examination of the impact of early life exposure and maternal physical and mental conditions during pregnancy is a topic of interest for investigation by exposomics, using environmental factors and biological samples from cohorts [9, 10]. Thus, it is recognized that numerous human diseases arise from the complex interplay between environmental exposure and host susceptibilities.

In the present work, we introduce recent progress in evaluating gene–environment interactions in large-scale genome epidemiological studies as well as in small-scale single studies targeting candidate genes.

2 Infant Birth Size, Including Low Birth Weight (LBW), Preterm Birth (PB), Small-for-Gestational-Age (SGA), Gestational Age, and Intrauterine Growth Restriction (IUGR) in Relation to Maternal Smoking

It is known that maternal smoking during pregnancy may lead to a reduction in infant birth size including LBW, PB, SGA, gestational age, and IUGR. In recent years, it has been found that this association is modified by genetic factors. In research studies published up to 2018, this association is reported to be modified by maternal genotypes of genes encoding the xenobiotic receptor (aromatic hydrocarbon receptor [AHR]), enzymes (cytochrome P450 [CYP] 1A1, CYP2A6, CYP2E1, glutathione S-transferase [GST] mu 1 [GSTM1], GST theta 1 [GSTT1], GST theta 2 [GSTT2], epoxide hydrolase 1 [EPHX1], 5,10-methylenetetrahydrofolate reductase [MTHFR], and NAD(P)H dehydrogenase [NQO1]), DNA repair proteins (X-ray repair cross-complementing gene 1 [XRCC1], XRCC3, 8-oxoguanine glycosylase [OGG1]), oncogenes (MDM4), and tumor suppressor genes (TP53) [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33] (Table 19.1). Moreover, this association has been observed to be modified by fetal genotypes of genes encoding xenobiotic-metabolizing enzymes (GSTT1 and NQO1) and cell-division-related genes (adenosine deaminase 1 [ADA1]) [14, 16, 29]. Most of these studies assessed the smoking status of pregnant women using a questionnaire. A few studies still evaluate the smoking status of pregnant women using objective indicators.

Table 19.1 Smoking and birth outcomes: gene–environment interactions (only relevant)

Although studies that have evaluated pregnant women by smoking status using objective indicators are limited, three previous reports have examined the effects of gene–environment interactions of maternal smoking on infant birth size using biomarkers [21, 24, 27]. Only one publication has examined the dose-dependent association of gene–environment interactions and maternal smoking with infant birth size among 3263 Japanese pregnant women enrolled in the prospective birth cohort of the Hokkaido Study on Environment and Children’s Health. Without consideration of genotype, maternal passive smoker levels (plasma cotinine levels: 0.22–11.48 ng/mL) were associated with a mean reduction of 55–57 g in birth weight, and maternal active smoker levels (plasma cotinine levels: ≥11.49 ng/mL) were associated with a mean reduction of 93–171 g in birth weight, compared with that elicited by non-passive smoker levels (plasma cotinine levels: ≤0.21 ng/mL). When maternal AHR (rs2066853) genotypes were considered, active smoker levels were associated with a mean reduction in birth weight of up to 102 g compared with that caused by non-passive smoker levels among maternal AA genotypes. However, active smoker levels were associated with a mean reduction of 182–217 g compared with that associated with non-passive smoker levels among maternal GG genotypes. Differences have also been observed in maternal XRCC1 (rs1799782) genotypes when examining the dose-dependent association between plasma cotinine levels and birth weight reduction (Fig. 19.1) [24]. Further, it has been shown that this association is not modified by maternal or fetal genotypes of genes encoding xenobiotic enzymes (CYP1A2, CYP1B1, cystathionine beta-synthase [CBS], GST theta pseudogene 1 [GSTTP1], methylenetetrahydrofolate dehydrogenase 1 [MTHFD1], 5-methyltetrahydrofolate-homocysteine methyltransferase [MTR], 5-methyltetrahydrofolate-homocysteine methyltransferase reductase [MTRR], N-acetyltransferase 2 [NAT2], and serine hydroxymethyltransferase 1 [SHMT1]), hormone-related factors and receptors (inhibinα [INHA], luteinizing hormone/chorionic gonadotropin receptor [LHCGR], and transforming growth factor-β receptor type 1 [TGFBR1]), or by none of these factors [21, 24, 29, 30, 32].

Fig. 19.1
figure 1

Association of maternal AHR (G>A, Arg554Lys; db SNP ID: rs2066853) and XRCC1 (C>T, Arg194Trp; db SNP ID: rs1799782 and G>A, Arg399Gln; db SNP ID: rs25487) genotype with maternal cotinine levels in relation to infant birth weight (n = 3263) [24]. Ninety-five percent confidence intervals (CI). Maternal plasma cotinine levels: level 1, 0.12–0.21 ng/mL; level 2, 0.22–0.55 ng/mL; level 3, 0.56–11.48 ng/mL; level 4, 11.49–101.66 ng/mL; level 5, 101.67–635.25 ng/mL. Multiple linear regression models are adjusted for maternal age, height, weight before pregnancy, parity, alcohol intake during the first trimester of pregnancy, education level, annual household income, infant gender, and gestational age. β represents the change in infant birth weight (g) in comparison with level 1 as the reference. Dot represents β values (95% CI); P < 0.05; ∗∗P < 0.01; ∗∗∗P < 0.001

The following points of the problems should be recognized when evaluating gene–environment interactions of an association between maternal smoking and infant birth size. Some of these problems include: (1) differences in time of evaluation (e.g., first trimester of pregnancy or the 8th month of pregnancy), (2) differences in the questions used for assessment (e.g., cigarettes/day or a choice between yes and no), (3) differences in evaluation methods (e.g., the use of objective biomarkers or subjective questionnaires), (4) differences in study design (e.g., case–control study or prospective birth cohort study), and (5) differences in the location of gene polymorphisms (e.g., rs4646903 or rs1048943 polymorphism of the CYP1A1 gene). Furthermore, maternal exposure levels such as through active smoking or passive smoking (second-hand smoke; SHS) must be considered.

Finally, it is necessary to focus on the components of tobacco smoke, which contains about 4000 substances. Receptors bind the chemical substances contained in tobacco smoke and initiate intracellular reactions. Based on findings from animal studies and cell experiments, it is important to examine gene–environment interactions focusing on single-nucleotide polymorphisms (SNPs) of genes involved in biological mechanisms, e.g., tobacco smoke induces toxicity. Therefore, studies should aim to elucidate the molecular epidemiology underlying the association between maternal smoking and birth outcome to identify the receptors in either the mother or infant (or both) that modify the association between components of tobacco smoke and adverse health effects on infants and children.

3 Effects of Exposure to Other Chemicals on Infant Birth Size

In studies performed up to 2018, many substances that pregnant women are exposed to have been found to modulate gene–environment interactions in fetal growth. These include environmental pollutants (benzo(a)pyrene), disinfection by-products of drinking water (trihalomethanes, chloroforms, and haloacetic acids), fatty acids (cholesterols, triglyceride, and docosahexaenoic acid [DHA]), vitamins (carotenes, vitamin C, vitamin D, and vitamin E), beverage-derived substances (alcohol and caffeine), types of particle matters (PMs) (PMs of aerodynamic diameter <10 μm [PM10], <2.5 μm [PM2.5], and nitrogen oxides [NOx]), metals (lead, mercury, and iron), short half-life chemicals (alkyl phosphates and phthalates), pesticides including those used in floriculture (organochlorines), and persistent organic pollutants (POPs; perfluoroalkyl substances (PFASs), and dioxins) [34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56] (Table 19.2). Reduced birth weight is caused by the association between PM10 levels and maternal CYP1A1 genotype [34], caffeine levels and maternal CYP1A2 genotype [37], benzo(a)pyrene levels and maternal GSTP1 genotype [36], vitamin D levels and maternal genotypes of the gene encoding group-specific component (vitamin D binding protein) locus (GC) [38], cholesterol levels and maternal apolipoprotein E (APOE) and apolipoprotein C3 (APOC3) genotype [39], DHA levels and maternal fatty acid desaturase (FADS) genotype [40], lead levels and maternal HFE genotype related to human hemochromatosis and transferrin (TF) genotypes [41], mercury levels and maternal GSTM1 and GSTT1 genotypes [42], iron levels and maternal GSTM1 genotypes [43], organochlorine pesticide levels and maternal GSTM1 and CYP17A1 genotypes [44, 45], perfluoroalkyl substance levels and maternal GSTM1 genotypes [46], dioxin levels and maternal AHR and GSTM1 genotypes [47, 57], PM2.5 levels and fetal CYP2D6 and GSTP1 genotypes [35], benzo(a)pyrene levels and fetal GSTP1 genotypes [36], and the association between alcohol consumption and fetal ADH2 genotype [50]. Increased risk of LBW is affected by the association between floriculture chemicals and maternal paraoxonase 1 (PON1) genotype [52], disinfection by-product levels in drinking water and maternal GSTM1 genotype [54], and between phthalate levels and fetal paraoxonase 2 (PON2) genotypes [52]. Increased risk of SGA is affected by the association between disinfection by-product levels in drinking water and maternal CYP2E1 and GSTT1 genotypes [55], and that between disinfection by-product levels in drinking water and fetal CYP17A1 genotypes [58]. No risk of SGA is affected by the association between caffeine levels and maternal and infant CYP1A2 genotypes, and the association between caffeine levels and maternal and infant CYP2E1 genotypes [59]. Reduced gestational age is affected by the association between NOx levels and maternal interleukin (IL)-17A genotype [48], and the association between alkyl phosphate levels and maternal PON1 genotypes [49]. Increased risk of IUGR is affected by the association between alcohol consumption and maternal CYP17 genotypes [55], and that between disinfection by-product levels of drinking water and fetal CYP2E1 genotype [56].

Table 19.2 Other chemicals and birth outcomes: gene–environment interactions (only relevant)

Many maternal and infant genes related to chemical substances have never been examined in relation to those involved in xenobiotic metabolism and hormone biosynthesis. However, at present, epidemiological evidence of disease susceptibility genes is limited with regard to gene–environment interactions for maternal chemical exposure and infant growth.

Additional studies are required to examine not only the genotypes of exposure susceptibility genes encoding metabolizing enzymes and receptors related to extraneous substances, but also genetic polymorphisms of disease susceptibility genes such as growth- and obesity-related genes. The results from epidemiological studies on the effects of gene–environment interactions on infant growth may also be of value for planning environmental policies involving genetically high-risk groups and public health programs, as well as for preventive medicine.

4 Recent Advances in Genome Birth Cohort Studies and Evaluation of Genetic Associations with Birth Weight

In large-scale epidemiological studies such as genome birth cohort studies, evaluation of gene–environment interactions is different from that in small-scale epidemiological studies (described in Sects. 19.2 and 19.3). The main reason for this difference is that, unlike hundreds of thousands of genome and epigenome data, it is difficult to obtain environmental data for each subject, except for their cigarette smoking or alcohol intake status. The development of strategies to address this difference in data availability is one of the most important challenges in large-scale epidemiological studies. Therefore, various methods for analyzing genetic susceptibility, such as GWAS, Mendelian randomization, and exposome linked with genomics, have been implemented.

4.1 GWAS and Meta-Analyses

A GWAS implements a population-based experimental design to detect associations between genetic variants and diseases or traits in biological samples from various human genome cohorts [7]. Meta-analyses of GWAS have identified numerous genetic variants associated with birth weight [60,61,62,63,64,65,66,67,68,69,70,71,72] (Table 19.3). Two SNPs, rs900400 near CCNL1 and rs9883204 in ADCY5, were identified to be robustly associated with birth weight [60]. Another related SNP in ADCY5 is considered to be implicated in the regulation of glucose levels and susceptibility to type 2 diabetes based on findings of an adult GWAS [73]. This may be the first evidence that associations between lower birth weight and subsequent non-communicable diseases (NCDs) such as type 2 diabetes have a genetic component; this is known as the developmental origins of health and disease (DOHaD) concept. Furthermore, an expanded GWAS meta-analysis and follow-up study involving 69,308 individuals of European descent from 43 studies revealed seven loci (CCNL1, ADCY5, HMGA2, CDKAL1, LCORL, ADRB1, and 5q11.2) associated with birth weight with genome-wide significance [66]. Among them, five loci are known to be associated with other adult phenotypes: ADCY5 and CDKAL1 with type 2 diabetes, ADRB1 with blood pressure, and HMGA2 and LCORL with height. These findings highlight multiple genetic links between birth weight and postnatal growth and metabolism, especially later in life, which are important in accordance with the DOHaD concept from a genetic point of view. Moreover, a multi-ancestry GWAS meta-analysis of birth weight in 153,781 individuals from the EGG Consortium and the UK Biobank identified 60 loci where fetal genotype was associated with birth weight with genome-wide significance [70]. This study revealed strong inverse genetic correlations between birth weight and adult cardiometabolic diseases and traits such as type 2 diabetes and coronary artery disease, blood pressure, cholesterol levels, and triglyceride levels (Fig. 19.2). Thus, a series of GWAS meta-analyses have confirmed genetic involvement in life-course associations between early growth phenotypes and adult cardiometabolic diseases and traits. GWASs of birth weight have thus far focused on fetal genetics, whereas relatively little is known about the role of maternal genetic variation. Recently, similar GWAS meta-analyses in up to 86,577 women of European descent from the same population revealed maternal loci associated with offspring’s birth weight, such as MTNR1B, HMGA2, SH2B3, KCNAB1, L3MBTL3, GCK, EBF1, TCF7L2, ACTL9, and CYP3A7, at GWAS significance [71]. Interestingly, maternal genetic factors associated with glucose metabolism, blood pressure, immune function, cytochrome P450 activity, and gestational duration affect the offspring’s birth weight. In a recent study, expanded GWASs of the same population examining maternal (n = 321,223) and offspring birth weight (n = 230,069 mothers) revealed 190 independent loci associated with both the mother and offspring’s birth weight [72]. In this study, structural equation modeling was used to evaluate the contribution of direct fetal and indirect maternal genetic effects, and Mendelian randomization was applied to reveal causal pathways. Surprisingly, maternal birth weight-lowering genotypes as proxy for an adverse intrauterine environment were found not to affect offspring blood pressure at all [72]. This finding indicates that some exceptions cannot be explained by the DOHaD concept.

Table 19.3 Genome-wide association studies (GWASs) and meta-analyses
Fig. 19.2
figure 2

Hierarchical clustering of birth weight (BW) loci based on the similarity of overlap with adult diseases, and metabolic and anthropometric traits [70]. “For the lead SNP at each BW locus (x axis), Z scores (aligned to BW-raising allele) were obtained from publicly available GWAS for various traits (y axis). A positive Z score (red) indicates a positive association between the BW-raising allele and the outcome trait, whereas a negative Z score (blue) indicates an inverse association. BW loci and traits were clustered according to the Euclidean distance among Z scores. Squares are outlined with a solid black line if the BW locus is significantly (P < 5 × 10−8) associated with the trait in publicly available GWAS, or with a dashed line if reported significant elsewhere” [70].

There are few GWAS meta-analyses that have examined the influence of smoking on birth weight; therefore, it is necessary to systematically evaluate gene–environment interactions in relation to smoking, especially genome-wide gene-smoking interactions. Notably, genome-wide gene-smoking interaction studies have become more common in adult GWAS meta-analyses to identify new loci associated with adult traits such as obesity, blood pressure, and serum lipids [74,75,76].

4.2 Mendelian Randomization

An alternative method, Mendelian randomization, entails the use of genetic variants as proxies for the environmental exposure under investigation [77]. Mendelian randomization is a GWAS-based theoretical method for environmental risk assessment that uses genetic factors associated with environmental factors to assess the causal effect on internal biomarkers such as body mass index (BMI), systolic blood pressure, and fasting glucose levels [8, 78]. Birth weight was used as both the outcome of maternal internal biomarkers and the internal marker of fetal intrauterine environment [78,79,80,81,82,83,84,85,86] (Table 19.4). Until recently, genetic risk scores were preferred over SNPs in Mendelian randomization studies [68, 78, 82, 83]. Novel approaches to obtaining genetic risk scores include assessments of the genetic contribution of certain intermediate traits or risk factors to cardiometabolic disease, risk prediction in high-risk populations, studies of gene–environment interactions, and Mendelian randomization [87].

Table 19.4 Mendelian randomization

Mendelian randomization is a recent developing field that involves the use of genetic variation to assess the causal relationship between exposure and outcome, where genetic variants within or near coding loci related to protein concentrations enable assessment of their causal role in disease. However, the more complex relationship between genetic variation and exposure makes the findings from Mendelian randomization more difficult to interpret [87]. Recently, the use of maternal birth weight-lowering genotypes to proxy for an adverse intrauterine environment in Mendelian randomization analyses yielded no evidence that such genotype causally raises offspring blood pressure [72]. Mendelian randomization is considered an established method for strengthening causal inference and estimating causal effects; however, the use of genetic instruments, which lack direct links to measurement in most cases, may be one of the limitations of Mendelian randomization.

4.3 Exposome Linked with Genomics

Although GWASs have revealed genetic associations and networks that improve the understanding of diseases, these findings only elucidate a small part of overall disease risk [88]. As disease causation is largely non-genetic, the need for improved tools to quantify environmental contributions seems obvious [89]. The “exposome” was originally defined as representing all kinds of environmental exposure, including those from diet, lifestyle, and endogenous sources, during the entire lifespan from the prenatal period onward, as a quantity of critical interest to disease etiology [90]. Three overlapping domains within the exposome have been described as follows: (1) a general external environment to include factors such as the urban environment, climate factors, social capital, and stress; (2) a specific external environment with specific contaminants, diet, physical activity, tobacco, and infections; and (3) an internal environment to include internal biological factors such as metabolic factors, gut microflora, inflammation, and oxidative stress [91]. Although it is difficult to elucidate the exposome entirely at present, the inherent value of exposomic data in cohort studies is that they can provide a greater understanding of relationships between a wide range of exposure and health conditions, and ultimately lead to more effective and efficient disease prevention and control [92]. For example, the NIH Child Health Environmental Analysis Resource (CHEAR) is a major step toward providing the infrastructure needed to study the exposome in relation to child health [9]. Furthermore, an EU-funded project, EXPOsOMICS, aims to develop a novel approach for assessing exposure to high-priority environmental pollutants, such as air and water contaminants, during critical periods of life, by characterizing the external and internal components of the exposome [93, 94]. Exposome research in the context of developing interventions is targeted at the population level to improve public health, whereas the application of genomics lies in preventive and therapeutic interventions targeted at individuals [94].

4.4 Perspectives

Analysis of genetic factors and environmental exposure offers evidence-based explanations for the associations between LBW, early growth, and increased propensity to develop NCDs in later life [60, 66, 70,71,72, 95]. Among epidemiological studies, GWASs have revealed the relative role of genes and the environment in disease risk, assisting in risk prediction such as in preemptive and precision medicine [96, 97]. Although Mendelian randomization is a powerful tool that utilizes genetic information to explain the likely causal relevance of an exposure to an outcome and increasingly complex gene-to-exposure and exposure-to-outcome relationships; thus, it is more difficult to perform reliable conduct and interpretation [8]. Tremendous strides are being achieved toward developing an exposomics approach together with infrastructural progress toward the identification of new methods and consortia that can address big-picture questions of how environment impacts health and development [9, 93]. Future studies should aim to develop new methods and analytical approaches for exposure assessment and data harmonization [9].

In Japan, the concept of preemptive medicine, which is a novel medical paradigm that advocates for pre-symptomatic diagnosis or prevention intervention at an early stage to prevent disease onset, has been proposed, and policy strategic proposals based on this approach have been made [97, 98]. Recently, collaborations of birth cohort studies are being supported by the project for babies and infants in research of health and development to adolescent and young adult (BIRTHDAY) of the Japan Agency for Medical Research and Development (AMED) on the basis of the current health policy “Overcoming health issues according to life stage” [99]. The rapid progress of large-scale epidemiological studies for elucidating gene–environment interactions is expected to occur in Japan in the near future.

5 Conclusions

Analysis of gene–environment interactions between tobacco and environmental chemicals and candidate genes, such as those encoding metabolic enzymes including CYPs and GSTs, has been performed via small-scale epidemiological studies. In the last 10 years, major innovations in research designs and methods, accompanied by recent rapid advances of analytical and measurement technologies, have occurred. In particular, GWASs and EWASs have become mainstream in genome cohort studies using advanced genomics and epigenomics, which has made it possible to better understand the genetic basis of diseases. As disease causation is largely non-genetic, the concept of the exposome was proposed as an improved tool to quantify total environmental contributions. Linking the exposome with genomics, in combination with preemptive and precision medicine, is expected to reveal novel insights into the origins of multiple complex diseases.