Introduction

Nonalcoholic fatty liver disease (NAFLD) is now recognized as an important health concern (Angulo 2002; Farrell 2003). NAFLD has a broad spectrum of effects, including simple steatosis, nonalcoholic steatohepatitis (NASH), fibrosis/cirrhosis, and hepatocellular carcinoma. Excess fat accumulation in the liver is observed in 20–30 % of the population in American and European countries, where NASH is associated with approximately 1–3 % of the population (Ludwig et al. 1980). NAFLD is now considered to be a part of metabolic syndrome (Marchesini et al. 2001; Stefan et al. 2008). Genetic as well as environmental factors are important in the development of NAFLD (Wilfred de Alwis and Day 2008).

Single-nucleotide polymorphisms (SNPs) are useful tools for identifying genetic factors and have been intensively investigated for various common diseases. We previously reported that variations in peroxisome proliferator-activated receptor γ coactivator 1α (PPARGC1A), angiotensin II type 1 receptor (ATGR1), and nitric oxide synthase 2 (inducible) (NOS2) genes are associated with NAFLD in Japanese individuals (Yoneda et al. 2008, 2009a, b).

Genome-wide association studies (GWASs) have revealed that SNPs in the patatin-like phospholipase domain containing 3 (PNPLA3) and other genes influence NAFLD and liver enzyme levels in the plasma (Romeo et al. 2008; Chalasani et al. 2010; Speliotes et al. 2011; Kawaguchi et al. 2012). We previously reported that the risk allele (G-allele) of PNPLA3 rs738409 is strongly associated with NAFLD as well as with increases in aspartate transaminase (AST), alanine transaminase (ALT), ferritin levels, and fibrosis stage in the patients with NAFLD in the Japanese population (Hotta et al. 2010).

To elucidate the detailed genetic background of NAFLD in the Japanese population, we performed genome-wide analysis for NAFLD.

Materials and methods

Subjects

For GWAS, 392 Japanese patients with NAFLD (NAFLD-1; 345 with NASH and 47 with simple steatosis) were enrolled. Genome-wide scan data for 934 general Japanese control subjects (control-1) described in the JSNP database (IMS-JST: Institute of Medical Science-Japan Science and Technology Agency Japanese SNP database, http://snp.ims.u-tokyo.ac.jp/) were used for GWAS. For the replication study, 172 patients with NAFLD (NAFLD-2; 97 with NASH, 4 with simple steatosis, and 71 with NAFLD) and 1012 control subjects (control-2) were analyzed. Control-2 subjects included Japanese volunteers who had undergone medical examination for common disease screening. All the NAFLD-1 and 101 NAFLD-2 patients underwent liver biopsy. Computed tomography (CT) or magnetic resonance imaging (MRI) was performed on 71 NAFLD-2 patients. Patients with the following diseases were excluded from the study: viral hepatitis (hepatitis B and C, Epstein–Barr virus infection), autoimmune hepatitis, primary biliary cirrhosis, sclerosing cholangitis, hemochromatosis, α1-antitrypsin deficiency, Wilson’s disease, drug-induced hepatitis, and alcoholic hepatitis (present or past daily consumption of more than 20 g alcohol per day). None of the patients showed clinical evidence of hepatic decompensation such as hepatic encephalopathy, ascites, variceal bleeding, or a serum bilirubin level greater than twofold the normal upper limit.

Liver biopsy tissues were stained with hematoxylin and eosin, reticulin, and Masson’s trichrome stain. Histological criterion for NAFLD diagnosis is macrovesicular fatty change in hepatocytes with displacement of the nucleus toward the cell edge (Sanyal 2002). When more than 5 % of hepatocytes are affected by macrovesicular steatosis, patients are diagnosed as having either steatosis or NASH. The minimal criteria for the diagnosis of NASH includes the presence of >5 % macrovesicular steatosis, inflammation, and liver cell ballooning, typically with predominantly centrilobular (acinar zone 3) distribution (Matteoni et al. 1999; Teli et al. 1995). Steatosis degree was graded as follows based on the percentage of hepatocytes containing macrovesicular fat droplets: grade 0, no steatosis; grade 1, <33 % hepatocytes containing macrovesicular fat droplets; grade 2, 33–66 % of hepatocytes containing macrovesicular fat droplets; and grade 3, >66 % of hepatocytes containing macrovesicular fat droplets (Brunt 2001). The activity of hepatitis (necroinflammatory grade) was also determined on the basis of the composite NAFLD activity score (NAS) as described by Kleiner et al. (2005). NAS is the unweighted sum of the scores for steatosis, lobular inflammation, and hepatocellular ballooning, and ranges from 0 to 8. Fibrosis severity was scored according to the method of Brunt (2001) and was expressed on a 4-point scale, as follows: 0, none; 1, perivenular and/or perisinusoidal fibrosis in zone 3; 2, combined pericellular portal fibrosis; 3, septal/bridging fibrosis; 4, cirrhosis.

Entire study was conducted in accordance with the guidelines of the Declaration of Helsinki. Written informed consent was obtained from each subject, and the protocol was approved by the ethics committee of Kyoto University, Yokohama City University, Hiroshima University, and Kurume University.

Clinical and laboratory evaluation

The weight and height of patients were measured using a calibrated scale after removing shoes and heavy clothing, if present. Venous blood samples were obtained from subjects after overnight fasting (12 h) to measure plasma glucose, hemoglobin A1c (HbA1c), total cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides, serum AST, ALT, iron, ferritin, hyaluronic acid, and type IV collagen 7S. All the laboratory biochemical parameters were measured using conventional methods.

DNA preparation, genome-wide genotyping and quality control

Genomic DNA was extracted using Genomix (Talent Srl, Trieste, Italy) for blood samples collected from each subject. Genome scans were conducted for NAFLD-1 patients using the Human660 W-Quad BeadChip (n = 104) or the HumanOmniExpress BeadChip (n = 288; Illumina, Inc., San Diego, CA, USA). Genome scan data from control-1 (n = 934) were genotyped using the Illumina HumanHap550 BeadChip and 515,286 SNPs in the autosomal chromosome were available in the JSNP database. A total of 295,887 common SNPs in the autosomal chromosomes were determined among 3 BeadChips. Individual call rates were all >0.99 in the patients and the control group. A total of 31,177 SNPs with minor allele frequency (<0.01), 901 SNPs with a lower success rate (<0.95), and 2,269 SNPs with distorted Hardy–Weinberg equilibrium (P < 0.001) were excluded; thus, 261,540 SNPs were subjected to case–control association analysis. Using phase II and III HapMap JPT, HCB and CEU data (http://hapmap.ncbi.nlm.nih.gov/), we confirmed that NAFLD-1 subjects in this study were derived from the Japanese population using multi-dimensional scaling (MDS) analysis (Supplementary Fig. 1). The number of alleles that shared identity-by-descent (PI_HAT) was calculated, and it was found that the PI_HAT value was less than 0.05 for the NAFLD-1 patients.

For the replication study, Invader probes (Third Wave Technologies, Madison, WI, USA) were constructed for 56 SNPs with P values less than 5.0 × 10−5. SNPs were genotyped for NAFLD-2 and control-2 using Invader assays as previously described (Ohnishi et al. 2001). The success rates of the Invader assays were >99.0 %. To validate GWAS, NAFLD-1 patients were also genotyped using the Invader assay and SNPs with concordance rate of both genotyping more than 99 % were used for further analysis. Thirteen SNPs showed a lower concordance rate (<0.99) and were excluded from further analysis.

Statistical analysis

A case–control association analysis was performed using the Cochran-Armitage trend test. Combined P values were obtained using Fisher’s combined probability test. Hardy–Weinberg equilibrium was assessed using the χ 2-test (Nielsen et al. 1998). PI_HAT and MDS analysis were performed using PLINK 1.07 (http://pngu.mgh.harvard.edu/purcell/plink) (Purcell et al. 2007). A Manhattan plot of GWAS and linkage disequilibrium (LD) was drawn using HaploView (Barrett et al. 2005). We categorized the genotypes as 0, 1, or 2 depending on the number of copies of risk alleles present. Multiple linear regression analyses were performed to test the independent effect per allele of each SNP on biochemical traits and histological parameters, accounting for effects of the other variables [i.e., age, gender, and body mass index (BMI)]. BMI, fasting plasma glucose, triglycerides, ferritin, hyaluronic acid, and type IV collagen 7 s values were logarithmically transformed before performing multiple linear regression analysis. Statistical analyses were performed using R software (http://www.r-project.org/).

Results

Genome-wide case–control association studies

We performed GWAS using NAFLD-1 (n = 392) and control-1 (n = 934). The characteristics of the study samples are presented in Table 1. After quality controls of genotyping results, 261,540 SNPs in autosomal chromosomes were used for case–control association analysis. To assess population stratification, we examined the quantile–quantile P value plot (Fig. 1a). A slight inflation in P values was observed according to the genomic control method (λ GC = 1.09). Because we used the JSNP database as a control, we were unable to evaluate population stratification between NAFLD-1 and control-1 subjects. Instead, we confirmed that all the NAFLD-1 subjects were collected from the Japanese population using MDS analysis (Supplementary Fig. 1).

Table 1 Clinical characteristic of the subjects
Fig. 1
figure 1

Quantile–quantile plot for genome-wide association (a) and regional plots of genome-wide significant loci (b). a The −log10 (P value) of observed association statistics is shown in y-axis, compared with −log10 (P value) of the association statistics expected under the null hypothesis of no association in x-axis. b. SNPs are plotted by their position on the chromosome against their association with NAFLD using GWAS data. The SNPs surrounding the top SNP (rs2896019) are colored to reflect their LD with the top SNP (using pairwise r 2 values from GWAS data of NAFLD-1 and control-1). The positions of genes as well as the direction of transcription are shown above the plots

To identify SNPs susceptible to causing NAFLD, we compared NAFLD-1 and control-1 subjects using the trend test. A Manhattan plot showed that one peak located on chromosome 22q13 was significantly associated with NAFLD and that some SNPs were marginally associated with NAFLD (Fig. 1b; Supplementary Fig. 3). To evaluate significant and marginally NAFLD-associated SNPs, we selected 56 SNPs with P values less than 5.0 × 10−5. We performed a replication study of NAFLD-2 (n = 172) and control-2 (n = 1012) subjects. After the replication study, 12 SNPs remained with P values less than 1.0 × 10−5, and 8 SNPs were significantly associated with NAFLD even when conservative Bonferroni’s correction was applied (P < 1.0 × 10−9, Table 2). All the eight SNPs were in the same LD block (Fig. 1b; Supplementary Fig. 2) and located at chromosome 22q13, which was previously reported to be NAFLD-susceptible (Romeo et al. 2008; Speliotes et al. 2011; Kawaguchi et al. 2012; Hotta et al. 2010). NAFLD patients have higher BMI compared with the Japanese general population (Table 1); thus, we performed multiple logistic regression analysis using genotypes, age, gender, and BMI as independent variables, involving NAFLD-1, NAFLD-2 and control-2 subjects. We also genotyped rs738409 since this SNP was most extensively examined. After adjusting for age, gender, and BMI, nine SNPs were strongly associated with NAFLD (P < 1.0 × 10−9, Supplementary Table 1). SNP rs738409 was most strongly associated with NAFLD before adjustment (P = 2.1 × 10−18); however, after adjustment, eight SNPs (rs2896019, rs3810622, rs738491, rs3761472, rs2143571, rs6006473, rs5764455, and rs6006611) had also smaller P values (1.8 × 10−10 to 1.8 × 10−13) compared to rs738409 (P = 6.8 × 10−14). Moreover, nine SNPs had high odds ratios (OR 1.84–2.05).

Table 2 List of the SNPs showing combined P < 1.0 × 10−5

When eight SNPs were adjusted with rs738409 using NAFLD-1, NAFLD-2 and control-2 subjects, no SNPs showed significant association (Supplementary Table 2). P value of rs738409 was smaller than other SNPs, suggesting that rs738409 is most important for the development of NAFLD. We also examined association between 3-SNP (rs738409, rs2896019, and rs3810622) haplotype in the PNPLA3, 4-SNP (rs738491, rs3761472, rs2143571, and rs6006473) haplotype in the SAMM50, and 2 SNPs (rs5764455, and rs6006611) haplotype in the PARVB genes, and NAFLD, using NAFLD-1, NAFLD-2 and control-2 subjects. Haplotype GGA in the PNPLA3 (P = 1.3 × 10−13, OR = 2.19), ACAA in the SAMM50 (P = 1.3 × 10−11, OR = 1.99), and TG in the PARVB (P = 5.0 × 10−12, OR = 2.06) genes were strongly associated with NAFLD (Supplementary Table 3). Haplotype analysis suggested that PNPLA3 gene is most important for the pathogenesis for NAFLD.

Analysis of various quantitative and histological phenotypes

Next, we investigated metabolic traits and NAFLD-susceptible SNPs since NAFLD is considered to be a part of metabolic syndrome (Marchesini et al. 2001; Stefan et al. 2008). Nine SNPs were associated with decreased serum triglycerides in NAFLD patients, but not in the control group (Table 3). Allelic effects on decreased triglycerides levels in NAFLD patients were similar among the nine SNPs. These SNPs were associated with increased AST and ALT both in NAFLD and control subjects. Allelic effects of SNPs in the PNPLA3 gene on increased AST levels in NAFLD patients were higher than those of other SNPs. Other metabolic traits were not associated with the nine SNPs.

Table 3 Association between significant SNPs and metabolic traits

The nine SNPs were associated with lobular inflammation, ballooning, and NAS (Table 4). Rs3810622 was not associated with lobular inflammation. Six SNPs (rs738409, rs2896019, rs738491, rs6006473, rs5764455, and rs6006611) were associated with fibrosis. Allelic effects on NAS and fibrosis of SNPs in the PNPLA3 and PARVB gene were stronger than those in the SAMM50 genes, although the associations were not significant according to multiple testing. Nine SNPs in the chromosome 22q13 region were associated with increased serum ferritin (except rs5764455). Five SNPs in the SAMMM50 and PARVB genes (rs738491, rs3761472, rs2143571, rs6006473, and 5764455) were associated with hyaluronic acid, which were high in NASH. No SNPs were associated with type IV collagen 7s. Nine SNPs showed different association levels with serum metabolic traits and histological severity suggesting that three genes (PNPLA3, SAMM50, and PARVB) may be involved in both the development and progression of NAFLD (Supplementary Fig. 4).

Table 4 Association between SNPs and histological traits and serum biomarker in NAFLD-1 and NAFLD-2 subjects

We have also performed the association tests of nine SNPs in patients with NASH and simple steatosis diagnosed by liver biopsy. Although the number of simple steatosis was small, nine SNPs were associated with NASH (OR = 1.76–2.79, Supplementary Table 4). Rs5764455 in the PARVB gene was most strongly associated with NASH (P = 3.4 × 10−6, OR = 2.79). Haplotype analysis revealed that most strongly association with steatosis was haplotype GGA in the PNPLA3 (P = 5.0 × 10−4) and that haplotype TG in the PARVB gene was most strongly associated with NAS (P = 9.6 × 10−7) and fibrosis (P = 4.4 × 10−4, Supplementary Table 5).

Discussion

To elucidate the genetic background of NAFLD, we identified the candidate genes (Yoneda et al. 2008, 2009a, b; Hotta et al. 2010). We performed GWAS and found that the PNPLA3-SAMM50-PARVB genetic region was significantly associated with NAFLD in the Japanese population. According to our previous study (Hotta et al. 2010), rs738409 in the PNPLA3 gene was most strongly associated with NAFLD. NAFLD patients are overweight to obese, and many have metabolic syndrome (Marchesini et al. 2001; Stefan et al. 2008). Even after adjusting for age, gender, and BMI, three SNPs (rs738409, rs2896019, and rs3810622) in the PNPLA3, four SNPs (rs738491, rs3761472, rs2143571, and rs6006473) in the SAMM50, and two SNPs (rs5764455 and rs6006611) in the PARVB genes showed significant P values. Our previous study indicated that the P value for the association between rs738409 and NAFLD increased after adjusting for age, gender, and BMI (Hotta et al. 2010). A recently reported GWAS showed the strongest association of rs738409 in the PNPLA3 gene with NAFLD in the Japanese population; however, the study did not adjust for age, gender, or BMI (Kawaguchi et al. 2012). Although numerous reports and a meta-analysis have indicated that rs738409 is associated with NAFLD (Sookoian and Pirola 2011) and that the PNPLA3 gene is thought to be responsible for the NAFLD, we demonstrated that SAMM50, and PARVB, and PNPLA3 are probably involved in NAFLD development.

NAFLD-susceptible SNPs were also associated with histological severity; however, the effects differed among the nine SNPs. Steatosis grade was equally affected by the nine SNPs. Association with histological activity (NAS) and severity (fibrosis stage) of NAFLD were stronger with SNPs in the PNPLA3 and PARVB genes. Among biomarkers, AST and ALT, which are commonly used to evaluate liver function, were highly associated with the two SNPs in the PNPLA3 gene, in NAFLD and in the control subjects. Ferritin and hyaluronic acid, the level of which increase in NASH, were associated with SNPs in the SAMM50 gene. SNP in the PARVB gene also showed strong association with NASH compared with simple steatosis. Haplotype analysis indicated that PNPLA3 gene would be most important for the development for NAFLD and that PARVB gene would be most important for the progression of NAFLD. Our data suggested that SNPs in PNPLA3, SAMM50 and PARVB contribute to the increased NAFLD activity, resulting in the progression from simple steatosis to NASH. It has been suggested that NASH is induced in two consecutive steps (the so-called 2-hit hypothesis): (i) excess fat accumulation in the liver and (ii) subsequent necroinflammation in the liver (Day and James 1998). Our results indicate that SNPs in PNPLA3 may be involved in the first hit and that SNPs in PNPLA3, SAMM50 and PARVB may be involved in the second hit. The associations were not significant for multiple tests; therefore, further analysis is necessary.

We previously reported that rs738409 is associated with decreased serum triglycerides in NAFLD patients (Hotta et al. 2010). In this study, we observed that SNPs, particularly in the SAMM50 gene, were associated with deceased levels of serum triglycerides. The association between SNPs in the PNPLA3 gene and decreased triglycerides levels in NAFLD is controversial (Kollerits et al. 2009; Speliotes et al. 2010, 2011). Recent reports indicate that rs738409 are associated with decreased serum triglycerides in type 2 diabetes (Palmer et al. 2012; Krarup et al. 2012). The controversy may be due in part to the observation that SNPs in the SAMM50 gene showed a stronger effect on triglyceride levels than the SNP in the PNPLA3 gene. Further investigation is necessary to elucidate the association between SNPs in PNPLA3 and SAMM50 genes and serum triglycerides levels.

PNPLA3 rs738409 has been extensively investigated, and a strong association with NAFLD has been confirmed (Day and James 1998). The PNPLA3 gene is thought to be involved in abnormal lipid metabolism in the liver of NAFLD patients. PNPLA3-deficient mice and transgenic mice did not show a fatty liver (Chen et al. 2010; Basantani et al. 2011; Li et al. 2012). Overexpression of PNPLA3 I148M in mouse liver developed to fatty liver, but not into NASH (Li et al. 2012). Thus, PNPLA3 plays an important in the development, but not in the progression of NAFLD. Our study suggests that the SAMM50 and PARVB genes may also be involved in the progression (necroinflammation and fibrosis) of NAFLD. Sam50, encoded by the SAMM50 gene, is a member of the sorting and assembly machinery for β-barrel proteins in the mitochondrial outer membrane. Sam50 was reported to be involved in the structural integrity of mitochondrial cristae, assembly of respiratory complexes, and maintenance of mitochondrial DNA. Long-term depletion of Sam50 influences the amounts of proteins in all the large respiratory complexes in the mitochondria (Ott et al. 2012). Mitochondrial abnormalities (loss of mitochondrial cristae and paracrystalline inclusions) have been described for liver biopsy specimens of patients with NASH (Sanyal et al. 2001; Caldwell et al. 1999). These reports and our results suggest that the SAMM50 gene may be involved in mitochondrial dysfunction and subsequent decreased removal of reactive oxygen species (ROS), leading to progression of NAFLD. The PARVB gene encodes parvin-β, which forms integrin-linked kinase-pinch-parvin complex, transmits signals from integrin to Akt/protein kinase B (PKB) (Kimura et al. 2010). Integrins are a large family of heterodimeric cell surface receptors that act as mechanoreceptors by relaying information between cells and from the extracellular matrix (ECM) to the cell interior. Since integrin receptors directly bind to ECM components to control remodeling, they are thought to play a crucial role in the evolution and progression of liver fibrosis (Desgrosellier and Cheresh 2010; Patsenker and Stickel 2011). Loss of parvin-β contributes to increased integrin-linked kinase activity and cell–matrix adhesion. Overexpression of parvin-β increases mRNA expression, serine 82 phosphorylation, and activity of peroxisome proliferator-activated receptor γ (PPARγ), leading to a concomitant increase in lipogenic gene expression (Johnstone et al. 2008). Our data and previous reports suggest that the PARVB gene is involved in lipid accumulation and/or fibrosis in the liver, resulting in NAFLD.

In summary, we demonstrated that polymorphisms in the SAMM50 and PARVB genes, as well as those in thePNPLA3 gene, were associated with NAFLD development and progression. SNPs in the PNPLA3 gene may be involved in the first hit and the SAMM50 and PARVB genes (and PNPLA3 gene) in the second hit, although further studies are necessary to confirm our results.