Introduction

Nonalcoholic fatty liver disease (NAFLD) is defined as the development of an abnormal accumulation of fat in the liver without significant alcohol intake, which may further progress to a wide spectrum of liver damage ranging from steatosis, non-alcoholic steatohepatitis (NASH), fibrosis and cirrhosis. The treatment option for NAFLD is still limited and currently, no pharmacotherapy is approved by the FDA to date. Physical activity and dietary interventions are still considered to be effective strategies to reduce liver fat at the initial, reversible stage of NAFLD such as hepatic steatosis and NASH [1,2,3].

Coffee is the most popular and widely consumed beverage in the world. It has been estimated that approximately 3.5 billion cups of coffee are consumed around the world each day [4,5,6]. Multiple epidemiological studies have demonstrated the protective association of coffee intake with the development of NAFLD, and the potential beneficial effects of coffee have also been examined with animal models [7,8,9,10]. However, there is thus far no sufficient evidence in human studies to clarify the causal association between coffee intake and the risk of NAFLD. One strategy to address this question is to perform a randomized controlled trial (RCT) to examine whether coffee intake can directly reduce the development of NAFLD. However, this would require a large sample size over a long period of time (given the chronic development of NAFLD) to generate any results. Over the past several years, Mendelian randomization (MR) has increasingly been used to effectively estimate the causal relationship between a modifiable environmental exposure of interest and a medically relevant trait or disease [11]. The MR analysis was designed based on the Mendelian inheritance rule where the parental genetic alleles (i.e., risk allele or non-risk allele) are randomly distributed to the offspring during the process of meiosis, which is considered to be analogous to RCT. This strategy was deemed to be convenient, low cost and less likely to be confounded by covariables [12]. The MR analysis uses genetic variants as the risk instrumental variables (IVs) of exposure (i.e., coffee intake), to examine whether this genetically-instrumented exposure is causally associated with a clinical outcome (i.e., NAFLD).

Genome-wide association studies (GWAS) have identified multiple loci that are strongly associated with coffee consumption [13, 14]. The identification of these genetic variants (e.g., SNPs) provides an opportunity to apply the MR analysis to test the causal relationship between coffee intake and NAFLD risk. Once confirmed, this strategy would provide strong evidence to rationalize the use of coffee intake to prevent the development of NAFLD. In this study, we apply a two-sample MR framework using SNPs associated with coffee consumption in published GWAS [13] to test the causal relationships between coffee intake and NAFLD risk as estimated using summary-level data from our recent GWAS in the UK Biobank [15].

Materials and methods

GWAS summary data for habitual coffee consumption

The most recent genome-wide meta-analysis for self-reported consumption of coffee included a significant proportion of UK Biobank samples (UK Biobank samples/total samples = 335,909/375,833 = 89.4%) [14], which may largely overlap with the cohort for the NAFLD GWAS (1122 cases and 399,900 healthy controls from UK Biobank, see more details on NAFLD GWAS in the following section). To minimize the bias induced by the participant overlap in two-sample MR [16], we obtained the summary statistics of the genetic associations with habitual coffee consumption from the largest UK Biobank-independent genome-wide meta-analysis [13]. The summary data of individuals of European descent (discovery stage: n = 91,462, validation stage: n = 30,062) were used for the MR analysis. Details on the study design, data analysis, and ethical approval were described in the original publication [13]. The original study performed a trans-ethnic meta-analysis for coffee consumption including individuals of European ancestry and African Americans. As we focused on the causal relationship between coffee intake and NAFLD among individuals of European ancestry, we performed a meta-analysis combining the summary-level data of the European individuals in the discovery (n = 91,462) and validation (n = 30,062) stages. The combined effects were analyzed through the “metafor” R package [17] assuming a fixed-effect model. Due to limited data availability, the meta-analyses were performed on the top ten significant (p < 1e-5) SNPs associated with coffee intake identified from the discovery stage. The results of the meta-analysis are shown in supplemental Table 1.

GWAS summary data for NAFLD

The summary-level association data for NAFLD were obtained from our previous GWAS study on NAFLD using the UK Biobank [15]. Individuals with ICD code [ICD-9 571.8 “Other chronic nonalcoholic liver disease” and ICD-10 K76.0 “Fatty (change of) liver, not elsewhere classified”] but without hepatitis B or C infection or other liver diseases were characterized as NAFLD cases. In total, there were 1122 cases and 399,900 healthy controls analyzed for the genome-wide associations with NAFLD. Basic demographic and clinical information for cases and controls is summarized in Table 1. We performed the association analysis using SAIGE [18] adjusting for sex, birth year, and the first four genetic PCs as covariates.

Table 1 Characteristics of the UKBB cohort for NAFLD GWAS

Construction of the genetic predictors for coffee intake

We constructed two genetic instruments for coffee intake based on the association statistics obtained from the discovery stage and the meta-analysis. Specifically, the first one included four independent (LD R2 < 0.01 based on the phase 3 data of the 1 kg European individuals) and genome-wide significant (p < 5e-08) SNPs identified in the discovery stage (n = 91,462) (Table 2). The second one consisted of six independent (LD R2 < 0.01) and genome-wide significant (p < 5e-08) SNPs identified through the meta-analysis (Table 3).

Table 2 Characteristics of the first genetic instrument (4 SNPs)
Table 3 Characteristics of the second genetic instrument (6 SNPs)

To assess the possibility of an insufficiently powered IV, we considered a third IV based on SNP–coffee associations at a liberal significance level (ranging from p = 5e-8 to p = 1e-4 in the discovery GWAS), which consisted of up to 77 SNPs.

We evaluated the strengths of the two genetic instruments using the F statistics = \(\left( {\frac{n - k - 1}{k}} \right)\left( {\frac{{R^{2} }}{{1 - R^{2} }}} \right)\), where n is the sample size, k is the number of genetic variants, and R2 is the variance in coffee intake explained by the genetic instrument. The strengths of the first (4 SNPs) and the second (6 SNPs) genetic instruments used for MR analysis were 124 and 119, respectively. Both F statistics were larger than the empirical strength threshold of 10 [19].

MR analysis

The inverse variance weighted (IVW) [20] method was used to estimate the causal effect of coffee intake on NAFLD risk. Since the IVW method requires that all instrumental variables meet the MR assumptions, we used two orthogonal methods (weighted median estimator [21] and MR-Egger [22]) to perform additional sensitivity analyses. The weighted median estimator provides consistent causal estimation as long as more than half of the instrumental variables are valid. The MR-Egger estimate is unbiased provided that the genetic instrument is not dependent on the pleiotropic effects. The intercept of the MR-Egger estimate is an indicator of the existence of the pleiotropic effects. We considered the absence of pleiotropic effects if the intercept was not significantly different from 0 (p > 0.05). Moreover, we used the MR-PRESSO global test [23] to evaluate the pleiotropy and identify outlier variants. The causal relationship is considered to be significant if (1) the p value of IVW method is less than 0.05, (2) the directions of estimates by the IVW, weighted median, and MR-Egger methods are the same and (3) both the MR-Egger intercept test and the MR-PRESSO global test are not significant (p > 0.05). The IVW, weighted median, and MR-Egger methods were performed using the “Mendelian Randomization” package [24], and MR-PRESSO global test was performed using the “MRPRESSO” package [23]. All of the data were analyzed and visualized using R v.3.5.0 (https://www.r-project.org/).

Results

No statistically significant causal effect of coffee consumption on NAFLD risk was observed in analysis using either the 4-SNP score (OR: 0.76; 95% CI 0.51, 1.14, p = 0.19, Table 4, Fig. 1a), or 6-SNP score (OR: 0.77; 95% CI 0.48, 1.25, p = 0.29, Table 4, Fig. 1b).

Table 4 Causal effect of coffee intake on NAFLD risk
Fig. 1
figure 1

Causal relationship between coffee intake and NAFLD risk. a Causal estimate by the 4 SNPs; b causal estimate by the 6 SNPs

Results from sensitivity analysis indicated that the causal estimates were unlikely to be biased by the pleiotropic effects (MR-Egger intercept test p = 0.69 and 0.54 for 4-SNP and 6-SNP score, respectively, MR-PRESSO global test p = 0.97 and 0.84 for 4-SNP and 6-SNP score, respectively). To assess the possibility of an insufficiently powered IV, we considered a third IV based on SNP–coffee associations at a liberal significance level (ranging from p = 5e-8 to p = 1e-4). Again, no statistically significant causal relationship was observed (Table 5).

Table 5 Exploratory MR analysis based on GWAS identified SNPs at different significance levels

Discussion

We performed the first two-sample MR analysis of coffee intake and the risk of NAFLD based on the summary-level data of large GWASs of coffee intake (exposure) and NAFLD (outcome). We observed no evidence in support of a causal relationship between coffee intake and NAFLD risk.

Coffee contains more than 1500 chemical components including caffeine, phenolic polymer, polysaccharides, chlorogenic acids, organic acids, et al. [25, 26]. Studies have demonstrated compounds in coffee exhibit antioxidant and anti-inflammatory properties [27, 28]. It has been suggested that coffee intake or components of coffee may have beneficial effects on metabolic disorders, e.g., obesity and diabetes. Multiple studies have indicated that caffeine intake leads to weight loss by enhancing thermogenesis and increased production of energy among type 2 diabetes patients with overweight [29]. Habitual coffee consumption may also be able to attenuate the genetic risk for increased BMI and obesity [30]. In addition to weight loss, coffee and its components have also demonstrated to enhance insulin secretion and sensitivity. Loopstra-Masters et al. [31] showed that caffeinated coffee intake can increase the insulin sensitivity while decaffeinated coffee was positively correlated with beta cell function in a population-based study included 954 multi-ethnic non-diabetic adults from the Insulin Resistance Atherosclerosis Study. As another biological component of coffee, cafestol was shown to significantly increase insulin secretion in insulinoma cells of the INS-1E rat [32]. Another study also suggested that coffee may also upregulate the function of skeletal muscle [33].

With regard to NAFLD, epidemiological studies have indicated that coffee intake is significantly associated with reduced risk of NAFLD [10, 34]. Animal studies also indicated that caffeine or other nutritional components in coffee may exert certain health benefits such as reducing angiogenesis and the production of reactive oxygen species [35, 36], improving insulin sensitivity, decreasing body weight and liver triglycerides [7,8,9,10, 37, 38], as well as reducing the pro-fibrotic activity of hepatic stellate cells and pro-inflammatory activity of Kupffer cells [39, 40]. Coffee may also alter the diversity of the gut microbiota, thus modulating the gut–liver axis for energy uptake and metabolism [8, 41]. Based on these observations, there is increasing interest in using coffee as supplementation for NAFLD prevention. However, despite these lines of evidence, the causal association between coffee intake and reduced NAFLD risk in humans remains unclear. Findings of the current study do not support a significant causal relationship and echoed the work done by Nordestgaard et al. [42] where genetically derived high coffee intake was not causally associated with obesity, metabolic syndrome and type 2 diabetes. Also, Hosseinabadi et al. [43] found that green coffee extract supplementation had no effect on liver steatosis grade, serum level of ALT, AST, LDL-C, total cholesterol and adiponectin in NAFLD clinical trials, albeit that BMI and serum HDL-C shows significant changes when compared to the control group.

The discrepancies between the epidemiological observations and the non-significant causal relationship between coffee consumption and NAFLD in our study could be due to multiple reasons. As we have summarized previously [44], the genetic alleles identified in GWAS of coffee drinking may be associated with caffeine metabolism, reward-response and potentially taste and thus not strong and specific genetic markers of coffee drinking per se. Pleiotropy is of particular concern. Indeed, seven of 14 SNPs reaching genome-wide significance are also strongly associated with other traits [44]. To address this potential bias, we used up to 77 different genetic variants to explore the causal association between coffee intake and NAFLD but obtained similar results. Finally, despite the very large sample size, we may still have been underpowered to rule-in or rule-out a causal relationship. These reasons may be an intrinsic issue of coffee genetics since a number of MR analyses have been published to date studying the causal relationship between coffee intake and many clinical outcomes but the majority of these studies did not show evidence for such a causal linkage [45,46,47,48]. It should be also noted that the NAFLD phenotype in our study is based on the ICD codes which may not reflect the true disease spectrum as observed in clinically characterized NAFLD patients. Unfortunately, although a few GWAS studies on clinically validated NAFLD have been published, the full summary data of these studies are not publically available to support a MR analysis. However, as demonstrated in our previous study [15], the GWAS based on these ICD codes-defined NAFLD produced a signature of genetic variants at the genome-wide level that is highly similar to the well-established genetic alleles and underlying genes identified in previous GWASs for NAFLD (e.g., PNPLA3, TM6SF2, etc.), suggesting that our GWAS data are reliable for MR analyses.

Conclusion

Our findings provided no statistically compelling evidence to support a causal relationship between coffee intake and NAFLD risk. However, our study may be limited by the choice of instrument variables that are not necessarily associated with coffee consumption. The study may be underpowered as well. More studies with a better-defined phenotype and well-characterized populations or clinical studies are needed to further clarify the true impact of coffee intake and the NAFLD risk.