Introduction

Coffee is among the most widely consumed beverages in the world.1 North American coffee drinkers typically consume ~2 cups per day while the norm is at least 4 cups in many European countries.1 In prospective cohort studies, coffee consumption is consistently associated with lower risk of Parkinson’s disease, liver disease and type 2 diabetes.2 However, the effects of coffee on cancer development, cardiovascular and birth outcomes and other health conditions remain controversial.2 For most populations, coffee is the primary source of caffeine, a stimulant also present in other beverages, foods and medications.1,3 The fifth edition of the Diagnostic and Statistical Manual of Mental Disorders does not include a diagnosis of caffeine dependence or abuse due to a paucity of evidence but lists caffeine intoxication and withdrawal as disorders.4 Knowledge of factors contributing to coffee’s consumption and physiological effects may greatly advance the design and interpretation of population and clinical research on coffee and caffeine.5 Genetic factors could be especially valuable as they offer ways to study the potential health effects of coffee via instrumental variables or gene–environment interactions.5 Heritability estimates for coffee and caffeine use range between 36 and 58%.6 Genome-wide association studies (GWAS) of habitual caffeine and coffee intake have identified variants near CYP1A2 and aryl hydrocarbon receptor (AHR).7, 8, 9 Cytochrome P450 (CYP)1A2 is responsible for ~95% of caffeine metabolism in humans and AHR has a regulatory role in basal and substrate-induced expression of target genes, including CYP1A1 and CYP1A2.10,11

To identify additional loci, we conducted a staged genome-wide (GW) meta-analysis of coffee consumption including over 120 000 coffee consumers sourced from population-based studies of European and African-American ancestry.

Materials and methods

Study design and populations

Supplementary Figure S1 depicts an overview of the current study. We performed a meta-analysis of GWAS summary statistics from 28 population-based studies of European ancestry to detect single-nucleotide polymorphisms (SNPs) that are associated with coffee consumption. Top loci were followed-up in studies of European (13 studies) and African-American (7 studies) ancestry and confirmed loci were explored in a single Pakistani population. Detailed information on study design, participant characteristics, genotyping and imputation for all contributing studies are provided in the Supplementary Information and Supplementary Tables S1–S6.

Phenotype

All phenotype data were previously collected via interviewer- or self-administered questionnaires (Supplementary Table S1). Our primary phenotype (‘phenotype 1’) was cups of predominately regular-type coffee consumed per day among coffee consumers. Coffee data collected categorically (for example, 2–3 cups per day) were converted to cups per day by taking the median value of each category (for example, 2.5 cups per day). A secondary analysis was performed comparing high with infrequent/non-coffee consumers (‘phenotype 2’). A subset of stage 1 studies collected information on decaffeinated coffee consumption; which was examined in follow-up analysis of the confirmed loci.

Statistical analysis

Each stage 1 (discovery) study performed GWA testing for each phenotype across ~2.5 million genotyped or imputed autosomal SNPs (HapMap II, Centre d’Etude du Polymorphisme Humain (CEU) reference), based on linear (cups per day, phenotype 1) or logistic (high vs none/low, phenotype 2) regression under an additive genetic model. Analyses were adjusted for age, smoking status and, when applicable, sex, case–control status, study site, family structure and/or study-specific principal components of population substructure (Supplementary Table S7). SNPs with minor allele frequency <0.02 or with low imputation quality scores were removed before meta-analysis (Supplementary Table S5). The GWAtoolbox (see Supplementary Information for URLs) was used for initial quality control. Minor allele frequencies and a plot comparing (1/median standard error of effect size) vs (square root of sample size) for each study were also reviewed for outliers and these were addressed before the final meta-analysis.

For both phenotypes, GW meta-analysis was conducted using a fixed-effects model and inverse-variance weighting with a single genomic control correction as implemented in METAL12 and GWAMA13 (r>0.99 for correlation between METAL and GWAMA results). The phenotypic variance explained by additive SNP effects was estimated in the Women’s Genome Health Study (WGHS, n=15 987 with identity-by-state <0.025) using GCTA.14 Stage 1 summary statistics were also subjected to pathway analysis using MAGENTA15 (Supplementary Information).

For regions achieving association P-values <5 × 10−8 (7p21, 7q23.11, 11p13 and 15q24), we performed conditional analysis using the summary statistics from the meta-analysis to test for the association of each SNP while conditioning on the top SNPs, with correlations between SNPs due to linkage disequilibrium (LD) estimated from the imputed genotype data from the Atherosclerosis Risk in Communities cohort,16 a large and representative cohort of men and women of European ancestry.

Our approach to select SNPs for replication (stage 2) is described in Supplementary Information. Stage 2 meta-analyses were performed separately for European and African-American populations, using the same statistical models and methods as described for stage 1, but without genomic control (Supplementary Information).

Studies from all stages were included in an overall meta-analysis using MANTRA (Meta-ANalysis of TRans-ethnic Association) studies ;17 which adopts a Bayesian framework to combine results from different ethnic groups by taking advantage of the expected similarity in allelic effects between the most closely related populations. MANTRA was limited to SNPs selected for replication thus no genomic control was applied. A random-effects analysis using GWAMA was performed in parallel to obtain effect estimates, which are not generated by MANTRA. The GW-significance threshold of log10 BF >5.64 approximates a traditional GW P-value threshold of 5 × 10−8 under general assumptions.18,19 Subgroup analysis and meta-regression were performed to investigate possible sources of between-study heterogeneity (Supplementary Information).

Fine-mapping

To assess the improvement in fine-mapping resolution due to trans-ethnic meta-analysis, we applied the methods of Franceschini et al.17 to stage 1 and stage 2 (African Americans only) GW-summary level data (Supplementary Information).

Potential SNP function and biological and clinical inferences

Details pertaining to follow-up of confirmed loci are provided in the Supplementary Information. Briefly, all confirmed index SNPs and their correlated proxies were examined for putative function using publicly available resources. Bioinformatics and computational tools were used to systematically mine available knowledge and experimental databases to inform biological hypotheses underlying the link between loci and coffee consumption as well as connections between loci. For these analyses all genes mapping to the confirmed regions were considered as potential candidates. Finally, we searched the National Human Genome Research Institute GWAS catalog20 and Metabolomics GWAS server21 for all GW-significant associations with our confirmed coffee SNPs. Complete GWAS summary data for coffee-implicated diseases or traits were additionally queried.

Results

SNPs associated with coffee consumption

Discovery stage

Results from the discovery stage are summarized in Supplementary Figures S2–S5. Little evidence for genomic inflation (λ<1.07) was observed for either phenotype. The two analyses yielded similarly ranked loci and significant enrichment of ‘xenobiotic’ genes (MAGENTA’s FDR<0.006), suggesting no major difference in the genetic influence on coffee drinking initiation compared with the level of coffee consumption among coffee consumers at these loci. Overall, ~7.1% (standard error: 2%) of the variance in coffee cups consumed per day (phenotype 1) could be explained by additive and common SNP effects in the WGHS.

Conditioning on the index SNPs of each region achieving association P-values <5 × 10−8 (7p21, 7q23.11, 11p13 and 15q24) in the discovery stage provided little evidence for multiple independent variants (Supplementary Figure S6). Only four of the SNPs on chromosome 7 were potentially independent and carried forward with other promising SNPs.

Replication and trans-ethnic meta-analysis

Forty-four SNPs spanning thirty-three genomic regions met significance criteria for candidate associations and were followed-up in stage 2 (Supplementary Tables S8–S13). Eight loci, including six novel, met our criteria for GW significance (log10 BF>5.64) in a trans-ethnic meta-analysis of all discovery and replication studies (Table 1; Supplementary Tables S14–S16; Supplementary Figures S7 and S8). Confirmed loci have effect sizes of 0.03–0.14 cups per day per allele and together explain ~1.3% of the phenotypic variance of coffee intake. We were underpowered to replicate these associations in a Pakistani population (Supplementary Information).

Table 1 SNPs associated with cups of coffee consumed per day among coffee consumers

Functional and biological inferences

Enhancer (H3K4me1) and promoter (H3K4me3) histone marks densely populate many of these regions and several non-synonymous and potential regulatory SNPs are highly correlated (r2>0.8) with the lead SNP and thus strong candidates for being a causal variant (Table 2; Supplementary Information; Supplementary Tables S17–S19). Candidate genes form a highly connected network of interactions, featuring discernible clusters of genes around brain-derived neurotrophin factor (BDNF) and AHR (Figure 1; Supplementary Information; Supplementary Tables S20 and S21). At least one gene in each of the eight regions (i) is highly expressed in brain, liver and/or taste buds, (ii) results in phenotype abnormalities relevant to coffee consumption behavior when modified in mice and (iii) is differentially expressed in human hepatocytes when treated with high (7500 μM) but not low (1500 μM) doses of caffeine (Table 2; Supplementary Tables S22–S24).

Table 2 Potential function of loci associated with coffee consumptiona
Figure 1
figure 1

Network describing direct interactions between candidate genes of confirmed loci. Relationships were retrieved from databases of transcription regulation and protein–protein interaction experiments (Supplementary Table S21). Genes are represented as nodes that are colored according to locus. Candidate genes for loci identified in the current study were supplemented with known candidate genes related to caffeine pharmacology (gray nodes). Edges indicate known interactions.

PowerPoint slide

Additional genomic characterization of the top loci allows further biological inference as follows:

(i) Previously identified loci near AHR (7p21) and CYP1A2 (15q24)

Consistent with previous reports in smaller samples,7, 8, 9 the intergenic 7p21 and 15q24 loci near AHR and CYP1A1/CYP1A2 respectively remained the most prominent and highly heterogeneous loci associated with coffee consumption. The same index SNPs were identified in European and African Americans, suggesting that they are robust HapMap proxies for causal variants in these two populations. Cohort-wide mean coffee consumption explained part of the heterogeneity in study results for both loci (Supplementary Table S25; Supplementary Information). The rs2472297 T and rs4410790 C alleles associated with increased coffee consumption have recently been associated with lower plasma caffeine levels21 and shown to increase CYP1A2-mediated metabolism of olanzapine.22 The C allele of rs4410790 is also positively correlated with cerebellum AHR methylation, suggesting a novel role of Ahr in motor or learning pathways that may trigger coffee consumption. The most significant variants at 15q24 reside in the CYP1A1-CYP1A2 bidirectional promoter where AHR response elements have been identified and shown to be important for transcriptional activation of both CYP1A1 and CYP1A2.23 The rs2472297 T variant putatively weakens the binding of SP1, a co-activator in the Ahr–Arnt complex regulating CYP1 locus transcription24 and is also implicated in the expression of several neighboring genes. The latter observation, together with this region’s high LD and long range chromatin interactions (Supplementary Figure S9), suggests a regulatory network among these genes.

(ii) Novel loci at 7q11.23 (POR) and 4q22 (ABCG2) likely function in caffeine metabolism

Variants at 7q11.23 (rs17685) and 4q22 (rs1481012) map to novel yet biologically plausible candidate genes involved in xenobiotic metabolism. rs17685 maps to the 3’UTR of POR, encoding P450 oxidoreductase which transfers electrons to all microsomal CYP450 enzymes.25 The rs17685 A variant associated with higher coffee consumption is linked to increased POR expression and potentially weakens the DNA binding of several transcriptional regulatory proteins including BHLHE40, which inhibits POR expression.26 The same SNP is in LD (CEU: r2=0.93) with POR*28 (rs1057868 and Ala503Val), which is associated with differential CYP activity depending on the CYP isoform, substrate and experimental model used.27 rs1481012 at 4q22 maps to ABCG2, encoding a xenobiotic efflux transporter. rs1481012 is in LD (CEU: r2=0.92) with rs2231142 (Gln141Lys), a functional variant at an evolutionarily constrained residue.28 However, fine-mapping of this region on the basis of reduced LD in the African-American sample limited an initial 189 102-kb region to a credible span of 6249 kb (Supplementary Table S16) that excluded rs2231142.

(iii) Novel loci at 11p13 (BDNF) and 17q11.2 (‘SLC6A4’) likely mediate the positive reinforcing properties of coffee constituents

The index SNP at 11p13 is the widely investigated missense mutation (rs6265 and Val66Met) in BDNF (Supplementary Table S26). BDNF modulates the activity of serotonin, dopamine and glutamate, and neurotransmitters involved in mood-related circuits and have a key role in memory and learning.29 The Met66 allele impairs neuronal activity-dependent BDNF secretion30 and thus may attenuate the rewarding effects of coffee and, in turn, motivation to consume coffee. The increasingly recognized roles of BDNF in the chemosensory system and conditioned taste preferences may also be relevant.31 The index SNP (rs9902453) at 17q11.2 maps to the EFCAB5 gene and is in LD (CEU: r2>0.8) with SNPs that alter regulatory motifs for AhR32 in the neighboring gene NSRP1, but neither gene is an obvious candidate for coffee consumption. Upstream of rs9902453 lies a possibly stronger candidate: SLC6A4 encoding the serotonin transporter. Serotonergic neurotransmission affects a wide range of behaviors including sensory processing and food intake.33

(iv) Novel loci at 2p24 (GCKR) and 17q11.2 (MLXIPL)

Variants at 2p24 (rs1260326) and 7q11.23 (rs7800944) map to GCKR and MLXIPL, respectively. The former has been associated with plasma glucose and multiple metabolic traits and the latter with plasma triglycerides (Table 3; Supplementary Table S27). Adjustment of regression models for plasma lipids in the WGHS (n~17 000) and plasma glucose in TwinGene (n~8800) did not significantly change the relationship between SNPs at these two loci and coffee consumption (P>0.48, Supplementary Tables S28 and S29). The rs1260326 T allele encodes a non-synonymous change in the encoded, glucokinase regulatory protein leading to increased hepatic glucokinase activity.34 Glucokinase regulatory protein and glucokinase may also cooperatively function in the glucose-sensing process of the brain35 that may, in turn, influence central pathways responding to coffee constituents. A direct link between MLXIPL and coffee consumption remains unclear, except for the interactions with other candidate genes (Figure 1). Experimental evidence and results from formal prioritization analyses also warrants consideration of other candidates in these regions (Figure 1; Table 2; Supplementary Tables S23). For example, in the frontal cortex, the rs1260326 allele positively associated with coffee consumption correlates with lower methylation of PPM1G; a putative regulatory target for AhR and binding target for PPP1R1B, which mediates psychostimulant effects of caffeine.36

Table 3 Associations between coffee consumption loci and other traits

Pleiotropy and clinical inferences

None of the eight loci was significantly associated with caffeine taste intensity (P>0.02) or caffeine-induced insomnia (P>0.08), according to previously published GWAS of these traits.37, 38, 39 SNPs near AHR associated with higher coffee consumption were also significantly associated with higher decaffeinated coffee consumption (~0.05 cups per day, P<0.0004, n=24 426); perhaps a result of Pavlovian conditioning among individuals moderating their intake of regular coffee or the small amounts of caffeine in decaffeinated coffee.1

Across phenotypes in the GWAS catalog,20 the alleles leading to higher coffee consumption at 2p24, 4q22, 7q11.23, 11p13 and 15q24 have been associated with one or more of the following: smoking initiation, higher adiposity and fasting insulin and glucose but lower blood pressure and favorable lipid, inflammatory and liver enzyme profiles (P<5 × 10−8, Table 3; Supplementary Table S27). Focused on metabolic, neurologic and psychiatric traits for which coffee has been implicated (Table 3; Supplementary Table S32), there were additional sub-GW significant associations in published GWAS. Variants associated with higher coffee consumption increased adiposity (rs1481012, P=4.85 × 10−3), birth weight (rs7800944, P=2.10 × 10−3), plasma high-density lipoprotein (HDL, rs7800944, P=2.24 × 10−3), risk of Parkinson’s disease (rs1481012, P=7.11 × 10−3), reduced blood pressure (rs6265, P=6.58 × 10−4; rs2472297, P<6.80 × 10−5 and rs9902453, P=6.05 × 10−3), HDL (rs6968554, P=1.18 × 10−3), risk of major depressive disorder (rs17685, P=6.98 × 10−3) and bipolar disorder (rs1260326, P=2.31 × 10−3). Associations with adiposity, birth weight, blood pressure, HDL and bipolar disorder remain significant after correcting for the number of SNPs tested.

Discussion

Coffee’s widespread popularity and availability has fostered public health concerns of the potential health consequences of regular coffee consumption. Findings from epidemiological studies of coffee consumption and certain health conditions remain controversial.2 Knowledge of genetic factors contributing to coffee’s consumption and physiological effects may inform the design and interpretation of population and clinical research on coffee.5 In the current report, we present results of the largest GWAS of coffee intake to-date and the first to include populations of African-American ancestry. In addition to confirming associations with AHR and CYP1A2, we have identified six new loci, not previously implicated in coffee drinking behavior.

Our findings highlight an important role of the pharmacokinetic and pharmacodynamic properties of the caffeine component of coffee underlying a genetic propensity to consume the beverage. Loci near BDNF and SLC6A4 potentially impact consumption behavior by modulating the acute behavioral and reinforcing properties of caffeine. Others near AHR, CYP1A2, POR and ABCG2 act indirectly by altering the metabolism of caffeine and thus the physiological levels of this stimulant. The strength of these four associations with coffee intake, along with results from pathway analysis showing significant enrichment for ‘xenobiotic’ genes, emphasize an especially pronounced role of caffeine metabolism in coffee drinking behavior. The current study is the first to link GCKR and MLXIPL variation to a behavioral trait. The non-synonymous rs1260326 SNP in GCKR has been a GW signal for various metabolic traits particularly those reflecting glucose homeostasis (Table 3). GCKR variation may impact the glucose-sensing process of the brain35 that may, in turn, influence central pathways responding to coffee constituents. Methylation quantitative trait loci and binding motif analysis suggest that PPM1G may be another candidate underlying the association between rs1260326 and coffee consumption. Variants near MLXIPL have also topped the list of variants associated with plasma triglycerides (Table 3), but their link to coffee consumption remains unclear. Future studies on the potential pleiotropic effects of these two loci are clearly warranted. Interestingly, several candidate genes implicated in coffee consumption behavior, but not confirmed in our GWAS, interact with one or more of the eight confirmed loci (Figure 1). While these findings are encouraging for ongoing efforts they also emphasize the need to study sets or pathways of genes in the future.

Specific SNPs associated with higher coffee consumption have previously been associated with smoking initiation, higher adiposity and fasting insulin and glucose but lower blood pressure and favorable lipid, inflammatory and liver enzyme profiles. Whether these relationships reflect pleiotropy, confounding or offer insight to the potential causal role coffee plays in these traits merits further investigation. Future research, particularly Mendelian Randomization and gene–coffee interaction studies, will need to consider the direct and indirect roles that each SNP has in altering coffee drinking behavior as well as the potential for interactions between loci (Figure 1). The heterogeneous effects specific to AHR- and CYP1A2-coffee associations point to SNP-specific interactions with the environment or population characteristics that might also warrant consideration (Supplementary Information).

The strong cultural influences on norms of coffee drinking may have reduced our power for loci discovery. This might, in part, underlie our lack of replication in a Pakistani population, wherein coffee consumption is extremely rare. Methodological limitations specific to our approach may also have reduced our power for loci discovery or precision in estimating effect sizes (Supplementary Information). For example, some studies collected coffee data in categories of cups per day (for example, 2–3 cups per day) rendering a less precise record of intake as well as a non-Gaussian distributed trait for analysis. The precise chemical composition of different coffee preparations is also not captured by standard food frequency questionnaire and is likely to vary within and between populations. Nevertheless, the eight loci together explain ~1.3% of the phenotypic variance, a value at least as great as that reported for smoking behavior and alcohol consumption which are subjected to similar limitations in GWAS.40,41

The additive genetic variance (or narrow-sense heritability) of coffee intake as estimated by GCTA in WGHS (7%) is considerably lower than estimates based on pedigrees (36–57%).6 The marked discrepancies between the GCTA and pedigree estimates of heritability may be due to one or more of the following: the potential contribution of rare variants to heritability (not captured by GCTA’s ‘chip-based heritability’), biases in pedigree analysis resulting in overestimates of heritability, differences in phenotype ascertainment or definition and cultural differences in the populations studied.42

In conclusion, our results support the hypothesis that metabolic and neurological mechanisms of caffeine contribute to coffee consumption habits. Individuals adapt their coffee consumption habits to balance perceived negative and reinforcing symptoms that are affected by genetic variation. Genetic control of this potential ‘titrating’ behavior would incidentally govern exposure to other potentially ‘bioactive’ constituents of coffee that may be related to the health effects of coffee or other sources of caffeine. Thus, our findings may point to molecular mechanisms underlying inter-individual variability in pharmacological and health effects of coffee and caffeine.