Introduction

Non-alcoholic fatty liver disease (NAFLD) is defined as a degree of steatosis of the liver in the absence of excessive alcohol consumption and other known causes [1], and has now become one of the most common liver diseases in the world. As a hepatic manifestation of the metabolic syndrome, NAFLD has seen its prevalence rise with the global economic enhancement [2, 3]. Previous studies have shown an association between the development of NAFLD and factors such as obesity, diet, exercise, and genetic variation [4]. As the research on NAFLD gradually intensified, researchers found that environmental factors also contribute significantly to the development of NAFLD, such as air pollution [5]. However, due to the diversity of environmental factors, it is a common doubt and interest to be exposed to which harmful environmental substances might play a contributing role in the development of NAFLD.

Although the risk factors for the development of the vast majority of diseases are still unknown, residual components of pesticides in the environment has been implicated in many human health outcomes [6, 7]. Glyphosate herbicides (GBHs) were marketed in 1974 and are among the most commonly used herbicides in the world, accounting for nearly 72% of global pesticide use [8]. Previous studies have shown that glyphosate (GLY) is widely present in ecosystems, such as soil, water and indoor dust [9, 10]. Increased levels of GLY can also be detected in the food chain, which may be related to the overuse of GLY in crops [11]. In conclusion, the widespread use of GLY has increased the risk to animals and humans of exposure to residual GLY in the environment. Based on toxicity tests conducted by the patent holder of GLY (i.e., Monsanto) in the 1970 and 1980 s, GLY was described as “virtually nontoxic” to animals and humans [12]. However, additional studies have found that GLY could be toxic to multiple organs, including but not limited to nephrotoxicity [13], hepatotoxicity [14], gastrointestinal [15], cardiovascular [16], respiratory [17], and reproductive systems [18]. The liver, the second most susceptible organ to GLY, has also been demonstrated in several studies. mills PJ demonstrated a significant increase in GLY excretion in patients with steatohepatitis [19]. Mesnage R demonstrated that chronic ultra-low dose GLY exposure can lead to liver dysfunction and that the resulting proteomic and metabolomic abnormalities overlap with NAFLD [20]. However, researches on the association of GLY with NAFLD were still limited so far.

Based on previous studies, we speculated that there is an association between GLY and the development of NAFLD which we proposed to validate with samples from the NHANES program. However, since liver biopsies are difficult to obtain in the NHANES program, but fatty liver index (FLI) has been validated as a valid tool for assessing NAFLD [21, 22]. Therefore, we aimed to evaluate the association between uGLY levels and FLI with data from NHANES 2013–2016 adult participants.

Methods

Data source

NHANES was established by the National Center for Health Statistics (NCHS) of the Centers for Disease Control and Prevention (CDC), and contains information on questionnaires, exams, and laboratory tests for selected study participants. The data contained in NHANES is open to the public, which allowed us to be exempt from ethical review of the study [23].

Study participants

A total of 20,146 participants were included in the NHANES 2013–2016 survey. Relying on the determination of adult age in previous NHANES-based cross-sectional studies [24,25,26], we first excluded participants younger than 20 years of age (n = 8658). Participants without clear uGLY information were then excluded (n = 8420), followed by those missing laboratory indicators used to calculate FLI, which were glutamyl transpeptidase (GGT) (n = 94), triglycerides (TG) (n = 2), body mass index (BMI) (n = 18), and waist circumference (WC) (n = 104). Since there is a close association between steatosis of the liver and viral hepatitis [27, 28]. Therefore, participants who were positive for hepatitis B virus (HBV) (n = 15) and hepatitis C virus (HCV) (n = 28) were also not included in this study. As urine is one of the important routes for GLY excretion out of the body [29], we excluded patients with renal weakness or renal failure (n = 78). Previous studies have confirmed that the side effects of glucocorticoids, tamoxifen and methotrexate lead to disturbances in fat metabolism of the liver [30,31,32]. Therefore, we excluded participants (n = 17) who had apparently used medications that interfere with fat metabolism ( in the past month). Specific drugs included meprednisolone (n = 3), prednisolone (n = 11), tamoxifen (n = 1) and methotrexate (n = 2). World Health Organization (WHO) recommends that if a urine sample is too dilute (creatinine concentration < 30 mg/dL) or too concentrated (creatinine concentration > 300 mg/dL), urine should be recollected and analyzed for creatinine and target chemicals [33]. Therefore, samples with urine that was too dilute (n = 78) or too concentrated (n = 72) were excluded from this study. In addition, participants were excluded for excessive alcohol consumption, defined as drinking more than 30 g of alcohol/day for male (n = 179) and more than 20 g/day for female (n = 145) [34]. Ultimately, a total of 2238 participants were included in this study. More details of the participants being screened were shown in Fig. 1.

Fig. 1
figure 1

Flow chart for participants

Independent and dependent variables

uGLY was defined as the independent variable. Urine samples from approximately one-third of participants aged 6 years and older were used to measure GLY concentrations (ng/ml) during the two survey cycles of NHANES 2013–2016 in total. Urine was collected at a mobile examination center (MEC) and subsequently aliquoted within a few hours and frozen in time for transport to the CDC’s National Center for Environmental Health (NCEH) for testing. A 200 µl urine sample was utilized, based on two-dimensional on-line ion chromatography with tandem mass spectrometry (IC-MS/MS) and isotope dilution quantification [35]. For analytes with analytical results below the lower limit of detection (LLOD), the analytic results were filled and placed using an estimated value, which was the LLOD divided by a square root of 2. The LLOD for uGLY was 0.2 ng/ml.

FLI was defined as the dependent variable, which had been validated as a valid tool for assessing NAFLD [21, 22]and was obtained by the following formula [36]:

Fatty Liver Index (FLI) = (e0.953×Ln (TG) + 0.139×BMI + 0.718×Ln(GGT) + 0.053×WC − 15.745) ÷ (1 + e0.953×Ln (TG) + 0.139×BMI + 0.718×Ln(GGT) + 0.053×WC − 15.745) × 100. TG, GGT, BMI and WC in the calculation formula represent triglycerides, glutamyl transpeptidase, body mass index and waist circumference, respectively. TG (mg/dl) and GGT (U/L) were derived from laboratory test information, and BMI (kg/m2) and WC (cm) were obtained from physical examination information.

Covariates

In order to better demonstrate accurate independent effects between independent and dependent variables, we included covariates that could potentially affect uGLY and FLI score in the model for adjustment. Questionnaire information: age (years), gender, race/ethnicity, education level, ratio of family income to poverty (PIR), hypertension, diabetes, physical activity intensity, smoking, herbicide use, and fasting time (hours). PIR represents the ratio of family income to poverty, which is calculated using the Department of Health and Human Services (HHS) poverty guidelines as a measure. We also inducted information on dietary intake of some nutrients: energy (Kcal), protein (gm), sugar (gm), fat (gm), cholesterol (mg), alcohol (gm), and moisture (gm). All participants in NHANES 2013–2016 underwent two 24-hour dietary recalls, and we had the average of the two recall information for the final analysis [37, 38]. Laboratory test information was included for alanine aminotransferase (ALT) (U/L), aspartate aminotransferase (AST) (U/L), creatine phosphokinase (CPK) (U/L), albumin (ALB) (g/dl), globulin (GLB) (g/dl), total bilirubin (TBIL) (mg/dl), blood urea nitrogen (BUN) (mg/dl), uric acid (UA) (mg/dl), creatinine (Cr) (mg/dl), total cholesterol (TC) (mg/dl), and serum iron (ug/dl). Considering that inflammation is a major factor affecting FLI, but c-reactive protein (CRP) information was not included in NHANES 2013–2016, previous studies demonstrated a close association between systemic immune inflammatory index (SII) and liver disease [39, 40]. Therefore, we also calculated SII as a covariate with the information of lymphocyte count (LYM) (1000 cells/UL), neutrophil count (NEU) (1000 cells/UL) and platelet count (PLT) (1000 cells/UL), calculated as [41]: SII = NEU/LYM × PLT.

Statistical analysis

We analyzed all data in this study using R (http://www.R-project.org) and EmpowerStats (http://www.empowerstats.com), and p < 0.05 was determined to be statistically significant. NHANES employs complex multi-stage probability sampling, and the sampling weights provided should be used appropriately in the statistical analysis to make the obtained sample representative. According to the official NHANES recommendations, the uGLY sample was tested only in 1/3 of the sample and unique subweights (WTSSCH2Y) were provided to obtain the final weights included in this study from the sum of the weights from NHANES 2013–2014 and NHANES 2015–2016 cycles divided by 2 [42]. Continuous variables were presented in the form of median and quartiles, and continuous variables with no more than 10% missing values could be filled with the mean, otherwise they were converted to dichotomous variables using the median as a cut-off, and the missing values were set as a separate group. Categorical variables with more missing values were set as a separate group for inclusion in the analysis. The normality test revealed that the available uGLY data were skewed. Therefore, we transformed the log function of uGLY with the constant “e” as the base [Loge(uGLY)] and used it for all the analyses in this study. Participants were categorized into two groups (< -1.011 ng/ml and ≥ -1.011 ng/ml) based on the median of Loge(uGLY) level. For the comparison of all variables between the two groups, the data set was parsed and weights were obtained by using the survey design R package as recommended on the NHANES website, and p-values were calculated with weighted linear regression (continuous variables) and chi-square tests (categorical variables). We screened the selected covariates before building the final model [43]. First, a stepwise screening based on the variance inflation factor (VIF) removed covariates with too high covariance (VIF > 5) (Supplementary Table 1). Subsequently, the final included covariates (Supplementary Table 4) were identified based on the principle that the introduction of a covariate in the basic model or its exclusion from the full model would have an effect of > 10% on the regression coefficient of Loge(uGLY) (Supplementary Table 2), or that the introduced covariate would have a statistical p-value < 0.1 on the regression coefficient of FLI (Supplementary Table 3). Multiple linear regression analysis was used to assess the association between Loge(uGLY) and FLI, and by adjusting for different covariates we generated three models. Model 1: without adjusting for any covariates, Model 2: age, gender, race/ethnicity, education level, and PIR were adjusted, and Model 3: all covariates within Supplementary Table 4 were adjusted. To further demonstrate the association between different Loge(uGLY) levels and FLI, we also built a model based on inverse treatment probability weighting (IPTW). Briefly, we constructed a logistic regression model with all screened covariates that could be used to determine the predicted probability of each independent participant being categorized into different Loge(uGLY) levels (< -1.011 ng/ml and ≥ -1.011 ng/ml). The inverse of the difference between the actual probability and the predicted probability of each participant being categorized into different groups was used as weights in subsequent model. Based on the use of weights, the differences in covariates between the different Loge(uGLY) groups could be balanced thereby more correctly assessing the association between different levels of Loge(uGLY) and FLI. It is worth noting that the independent variables in the IPTW-based model were the groups (< -1.011 ng/ml and ≥ -1.011 ng/ml) after grouping at the median of Loge(uGLY). Smooth curve fitting (penalized spline method) was used to evaluate whether there was a nonlinear association between Loge(uGLY) and FLI, and a generalized additive model was further used to assess the threshold effect between the two. Threshold effect analyses aims to find a particular threshold of the independent variable and test for inconsistency in the association between the independent variable and the response variable before and after this threshold [44]. A likelihood ratio (LLR) of less than 0.05 is generally considered a criterion for the existence of a nonlinear association. Finally, we performed subgroup analysis to identify susceptible individuals for the association between Loge(uGLY) and FLI and performed an interaction test.

Results

Baseline characteristics of participants

A total of 2238 participants were eventually enrolled in the study. Participants with Loge(uGLY) level ≥ -1.011 ng/ml had significantly higher FLI than those with Loge(uGLY) level < -1.011 ng/ml, with a statistically significant difference between the two groups (P < 0.001). The baseline characteristics of the participants were shown in Table 1.

Table 1 Characteristics of participants

The association between Loge(uGLY) level and FLI

In all models (models 1–3), there was a positive association between Loge(uGLY) level and FLI. The fully adjusted model (model 3) results suggested that each 1-unit increase in Loge(uGLY) would be accompanied by a 2.16-unit increase in FLI (Beta coefficient = 2.16, 95% CI: 0.71, 3.61). Subsequently, we displayed Loge(uGLY) in tertile 1 (-1.959- -1.190), Tertile 2 (-1.191- -1.030) and Tertile 3 (-1.031- 0.104). Compared to Tertile 1, the trend of positive association between independent and dependent variables was more significant as Loge(uGLY) increased (P for trend < 0.001), and was most pronounced in Tertile 3 (Beta coefficient = 4.19, 95% CI: 1.49, 6.88). All results were presented in Table 2. In addition, we further validated the positive association between Loge(uGLY) and FLI by employing IPTW (Beta coefficient = 2.07, 95% CI: 0.18, 3.96) (Supplementary Table 6). Information on baseline characteristics of participants after IPTW was displayed in Supplementary Table 5.

Table 2 Association of Loge(uGLY) level with FLI

Subgroup analysis of the association between Loge(uGLY) level and FLI

To verify the stability of the positive association between Loge(uGLY) level and FLI in the cohort with characteristics, we performed a subgroup analysis. In the fully adjusted model, the results of the subgroup analysis showed that this positive association were more significant in participants who were female (Beta coefficient = 2.80, 95% CI: 0.74, 4.85), 40–60 years old (Beta coefficient = 3. 80, 95% CI: 0.71, 5.44), other races/ethnicities (Beta coefficient = 5.19, 95% CI: 2.29, 8.08), without hypertension (Beta coefficient = 1.84, 95% CI: -0.52, 4.20) and borderline diabetes (Beta coefficient = 2.21, 95% CI: -8.17, 12.59). In addition, we performed interaction tests for gender, age, race/ethnicity, hypertension, and diabetes to further commit the stability of the results in the subgroup analysis (P > 0.05 for interaction test). The results of the subgroup analysis were presented in Table 3.

Table 3 Results of subgroup analysis

Positive linear association between Loge(uGLY) level and FLI

Finally, we evaluated whether there was a nonlinear association between Loge(uGLY) level and FLI with smooth curve fitting and threshold effect analysis. Depending on Fig. 2A; Table 4, a linear association (LLR = 0.364) was observed between independent and dependent variables. In addition, we also designed to verify whether this positive linear association was stable in each subgroup mentioned above, for which we performed smoothed curve fitting and threshold effect analysis for each subgroup respectively. The results suggested that the positive linear association between independent and dependent variables was durable for gender (Fig. 2B and Supplementary Table 7), age (Fig. 2C and Supplementary Table 8), and hypertension (Fig. 2E and Supplementary Table 9). However, when diabetes was the subgroup, a significant increase in FLI (Beta coefficient = 5.12, 95% CI: 0.69, 9.55) would occur when the Loge(uGLY) of participants with diabetes exceeded − 1.55 (LLR = 0.007) (Table 5; Fig. 2F). In addition, the black race/ethnicity participants had an effect value of (Beta coefficient = 17.00, 95% CI: 2.74, 31.25) between Loge(uGLY) and FLI when Ln(uGLY) exceeded − 0.12 (LLR = 0.026). Participants of other races/ethnicities showed a positive association with FLI only when Loge(uGLY) was less than − 0.46 (Beta coefficient = 8.95, 95% CI: 4.74, 13.16) (LLR = 0.015), although when Loge(uGLY) was more than − 0.46 it showed a negative association with FLI which was not statistically different (Beta coefficient=-5.26, 95% CI: -14.25, 3.72) (Table 6; Fig. 2D).

Fig. 2
figure 2

Smoothed curve fit of the association between Loge(uGLY) level and FLI. (A) The association of Loge(uGLY) level with FLI for all participants. The solid red line represents a smooth curve fit of the association between the independent variable and dependent variable, and the gray band represents the 95% confidence interval of the fit. Each blue point represents a sample. (B-F) The association of Loge(uGLY) level with FLI stratified by gender, age, race/ethnicity, hypertension and diabetes. Specific colors in each figure represents the subgroup characteristics of the participants. The lines represent a smooth curve fit between the independent variable and dependent variable. Each point represents a sample. *All the covariates were adjusted

Table 4 Threshold effect analysis of the association between Loge(uGLY) level and FLI
Table 5 Threshold effect analysis of Loge(uGLY) level and FLI stratified by diabetes
Table 6 Threshold effect analysis of Loge(uGLY) level and FLI stratified by race

Supplementary results

We excluded participants who did not have information about uGLY (n = 15,408) and FLI (n = 1286) explicitly, while retaining participants who were younger than 20 years of age, consumed large amounts of alcohol, were taking medications that interfere with fat metabolism, had viral hepatitis, had substandard urine samples, and were in renal weakness/failure to perform sensitivity analyses for exclusion criteria on the association between Loge(uGLY) and FLI (Supplementary Fig. 1). The results demonstrated that the true association between Loge(uGLY) and FLI would be severely affected when participants in Fig. 1 were not excluded (Beta coefficient = 0.01, 95% CI: -0.04, 0.05). The results of the sensitivity analysis to the exclusion criteria were displayed in Supplementary Table 10. In the validation results of the positive linear association between Loge(uGLY) level and FLI, through Fig. 2A we found that there were outliers in Loge(uGLY), for which we examined the data distribution of Loge(uGLY) (Supplementary Table 11). Through the results we could observe that the sample size of Loge(uGLY) over 1 is 26, which accounted for 1.162% of the total sample size, such a result belonged to small sample events (< 5%). For this reason, we removed these outliers and re-examined the association between Loge(uGLY) level and FLI. Based on the results, we found that after removing the outliers, the supplemental results were absolutely consistent with our results above. The supplemental results were shown in Supplementary Tables 1220, Fig. 3A-F.

Fig. 3
figure 3

The association of Loge(uGLY) level with FLI after excluding outliers. (A) The association of Loge(uGLY) level with FLI for all participants. The solid red line represents a smooth curve fit of the association between the independent variable and dependent variable, and the gray band represents the 95% confidence interval of the fit. Each blue point represents a sample. (B-F) The association of Loge(uGLY) level with FLI stratified by gender, age, race/ethnicity, hypertension and diabetes. Specific colors in each figure represents the subgroup characteristics of the participants. The lines represent a smooth curve fit between the independent variable and dependent variable. Each point represents a sample. *All the covariates were adjusted

Discussion

The association between environmental exposure to chemical factors and the incidence of NAFLD is a public health problem of global concern. Our study investigated the association between Loge(uGLY) level and FLI in US adults using nationally representative data. After adjusting for relevant covariates such as sociodemographic variables, diet, lifestyle, and physical activity information, we found a positive linear association between increased level of GLY in urine and the FLI, providing important new evidence for the epidemiology of NAFLD.

GLY has become a best seller worldwide since its release for sale as the most prominent ingredient of herbicides, and GLY is more likely to enter humans through the dermal, oral, and pulmonary routes [12]. Previous studies have demonstrated that even within the low dose range considered safe, impairment of liver function was observed in rats chronically exposed (2 years) to GLY [45]. The results of transcriptome sequencing indicate that gene expression profiles were dominated by lipid deposition and mitochondrial membrane dysfunction [45]. On the basis of the “two-hit” theory, mitochondrial membrane dysfunction has been proposed to be critical in the development of NAFLD, such as impaired fatty acid oxidation and oxidative phosphorylation [46]. The current theory suggested that an equally useful conceptual framework was that the liver’s ability to process major metabolic energy substrates (carbohydrates and fatty acids) was overwhelmed, leading to the accumulation of toxic lipids and thus activating the pathway of NAFLD development [47]. In another study, the investigators found that in the livers of GLY-administered rats, a trend towards accumulation of most fatty acids, particularly acylcarnitine, and a statistically significant increase in cholesterol levels were observed, and proteomics profiling further confirmed the significant changes in the metabolic processes of lipid detoxification [20].

As a new epidemiological report, we not only found that uGLY level was associated with an increase in FLI, but in order to get an adapted cohort, we further did a subgroup analysis. The most important result suggested that the positive association between uGLY and FLI was more significant in female participants compared to males. This was an interesting finding, as in a clinical observational study, the investigators found significantly higher GLY residues in the urine of women than men [19]. In addition, Mesnage R demonstrated the effects of GLY exposure on the liver by using female rats in their study [20], and in their another study, Mesnage R’s explanation for why male rats were dropped as subjects was because male animals suffered more severe liver and kidney damage than females, leading to increased premature mortality [45]. However, this would not clarify the conclusions we have obtained so far. Giommi C conducted a transcriptomic analysis using GLY-exposed zebrafish livers and showed that transcript levels of heat shock protein 70.2 were elevated in female zebrafish and decreased in male zebrafish [48]. In contrast, Antunes AM showed a sex-dependent increase in hepatocyte vascular area, with higher values in males compared to GLY-exposed female peacock fish [49]. In the epidemiology of NAFLD, previous studies have shown a higher prevalence and severity of NAFLD in males of reproductive age than in females. However, after menopause, the prevalence of NAFLD in women would exceed that in men, suggesting a protective effect of estrogen on the development of NAFLD [50]. In a cross-sectional study using the NHANES program, Geier DA observed a significant negative association between the concentration of GLY and total estradiol, as well as a negative trend between the concentration of GLY and total testosterone [42]. In conclusion, there was still no valid evidence for the effect of GLY on hepatic steatosis by gender, and our study provides a new and important evidence. In addition, we found another valid subgroup group to be participants aged 40–60 years. Based on the NHANES III survey, 16.1% of NAFLD patients were 30 to 40 years old, followed by 22.3% of those 41 to 50 years old, 29.3% of those 51 to 60 years old, and 27.6% of those over 60 years old [51]. Current studies generally supported a negative association between GLY levels and age [52,53,54,55]. In contrast, another NHANES survey found a positive association between GLY levels and age when age exceeded 20 years [42]. Thus, previous studies provided evidence for our current results. According to the most recent NHANES report, non-Hispanic whites had higher GLY exposure than other populations [55], but our results suggested that the positive association between GLY and FLI was more significant in other other races/ethnicities. The potential association was unknown to us because of the overly broad range of participant other races/ethnicities included in this subgroup. However, based on previous NHANES findings, it was similarly concluded that whites had the highest prevalence of NAFLD, while blacks had the lowest, and other other races/ethnicities were in the middle of the prevalence range [56]. For the current findings, we speculated the existence of other potential influences, such as the finding in an NHANES survey that populations classified as other other races/ethnicities would have higher dietary fiber intake compared to non-Hispanic whites [57], where grains can provide sufficient dietary fiber to be acceptable. Furthermore, we also hope that future multicenter prospective cohort studies with large samples would validate the current findings.

As the first cross-sectional study to investigate the association between GLY level and FLI, our study contains the following strengths. NHANES follows a rigorous and well-designed study protocol with a wealth of high-quality data, including measurements of many environmental contaminants of potential public health importance.The NHANES program takes into account sample weighting issues so that the results of the study obtained are broadly applicable to the general US population. Also, the sample size of our study was large enough to allow for relevant subgroup analyses that were utilized to validate the robustness of the results. However, there were still some limitations to our study. First, the cross-sectional study was unable to identify causality. Therefore, the existence of reverse causality between uGLY and FLI has yet to be verified. Second, the potential factors that might influence either FLI or uGLY level are diverse and variable, although we collected as many possible confounders as possible in NHANES and adjusted for them in the model. However, there continued to be no guarantee that other confounding factors existed that could have biased the results. Finally, the risk of population exposure to GLY and the ability of the body to metabolise it are movable, and it is still worth exploring whether the long-term effects of GLY on the human body can be captured by only one measurement. However, given the limited evidence from current studies on GLY and liver disease, we believe that the current results would still be of good value.

Conclusions

We contributed a new evidence for the management of NAFLD by analysing data from NHANES 2013–2016. Overall, there was a positive association between Loge(uGLY) level and FLI, and this positive association was linear. US adults who are female, other other races/ethnicities, and aged 40–60 years should be more cautious about the higher risk of FLI associated with exposure to GLY.