Introduction

Nonalcoholic fatty liver disease (NAFLD) is the most common chronic liver disease (CLD) in many developed countries and is a serious public health problem worldwide. NAFLD includes a wide spectrum of liver diseases that range from simple steatosis, which is usually a benign and nonprogressive condition, to nonalcoholic steatohepatitis (NASH), which can progress to liver cirrhosis (LC) and hepatocellular carcinoma (HCC) despite the absence of significant alcohol consumption [15]. Liver biopsy remains a reliable tool for the diagnosis of NASH [1, 6], and the most sensitive and specific method of providing prognostic information. Practically speaking, however, it is difficult to perform liver biopsy for every patient with NAFLD to ascertain the presence of NASH [7]. Moreover, biopsy itself has significant limitations such as pain, risk of severe complications, sampling error [8, 9], cost, and patient unwillingness to undergo invasive testing. Therefore, there is an urgent need to develop and validate simple, reproducible, noninvasive tests that accurately distinguish NASH from NAFLD and determine the stage of the disease [7]. Noninvasive approaches for this purpose have included a combination of clinical features and routine laboratory investigations, as well as some readily available serum markers of fibrosis [6, 7, 10, 11]. Most of these noninvasive approaches have consisted of small sample sizes and have lacked rigorous external validation.

The purposes of this study were (1) to develop a simple noninvasive scoring system aimed at differentiating NASH from NAFLD patients by using easily available clinical and biochemical variables and (2) to validate the results in a separate cohort of patients.

Methods

Patients

A total of 177 patients with well-characterized and liver-biopsy-confirmed NAFLD were included in this study to establish a simple method to detect NASH. They were consecutively biopsied patients who were seen at the Center for Digestive and Liver Diseases, Nara City Hospital from 2002 to 2008. To validate the model, 442 patients with biopsy-proven NAFLD from 2002 to 2008 were enrolled from eight Hepatology Centers in Japan: Division of Gastroenterology, Yokohama City University Graduate School of Medicine; Department of Medicine and Molecular Science, Graduate School of Biomedical Sciences, Hiroshima University; Department of Gastroenterology and Hepatology, Kochi Medical School; Department of Internal Medicine, Saga Medical School, Saga University; Department of Hepatology, Graduate School of Medicine, Osaka City University; Department of Gastroenterology and Hepatology, Kyoto Prefectural University of Medicine; Division of Gastroenterology and Hematology/Oncology, Department of Medicine, Asahikawa Medical College; and Hepatology Center, Saiseikai Suita Hospital.

The diagnosis of NAFLD was based on the following criteria: (1) liver biopsy showing steatosis in at least 5% of hepatocytes [12] and (2) appropriate exclusion of liver diseases of other etiology including viral hepatitis, autoimmune hepatitis, drug-induced liver disease, primary biliary cirrhosis, biliary obstruction, hemochromatosis, Wilson’s disease, and α-1-antitrypsin-deficiency-associated liver disease. Patients who consumed >20 g alcohol per day and patients with evidence of decompensated LC or HCC were excluded. Diabetic patients treated with exogenous insulin or insulin sensitizers (metformin or pioglitazone) were also excluded. Written informed consent was obtained from all patients at the time of their liver biopsy, and the study was conducted in accordance with the Helsinki Declaration.

Anthropometric and laboratory evaluation

Venous blood samples were taken in the morning after a 12-h overnight fast. The laboratory evaluation in all patients included a blood cell count and the measurement of aspartate aminotransferase (AST), alanine aminotransferase (ALT), γ-glutamyl transpeptidase, cholinesterase (ChE), total cholesterol, triglyceride, albumin, fasting plasma glucose (FPG), immunoreactive insulin (IRI), ferritin, hyaluronic acid (HA), and type IV collagen 7S. These parameters were measured using the standard techniques of clinical chemistry laboratories. Body mass index (BMI) was also calculated. Obesity was defined as BMI >25, according to the criteria of the Japan Society for the Study of Obesity [13]. Patients were assigned a diagnosis of diabetes mellitus (DM) if they had documented use of oral hypoglycemic medication, a random glucose level >200 mg/dl, or FPG >126 mg/dl [14]. Dyslipidemia was diagnosed if the cholesterol level was >220 mg/dl and/or triglyceride level was >160 mg/dl. Hypertension was diagnosed if the patient was taking antihypertensive medication and/or had a resting recumbent blood pressure ≥140/90 mmHg on at least two occasions.

The HAIR score [15] was calculated by summation of the scores of hypertension (1 point), ALT >40 IU/l (1 point), and insulin resistance (IR) index >5 (1 point). IR index was calculated using the formula: 1/quantitative insulin sensitivity check index (QUICKI) [16] = log fasting IRI (μU/ml) + log FPG (mg/dl). Palekar’s score [17] was calculated by summing the risk factor of age ≥50 years, female sex, AST ≥45 IU/l, BMI ≥30 kg/m2, AST/ALT ratio (AAR) ≥0.80, and HA ≥55 ng/ml. Gholam’s score [18] was calculated by the formula: 2.627 × ln AST + 2.13 for DM. The BAAT score [19] was calculated by summing the risk factor of BMI ≥28, age ≥50 years, ALT levels measuring twice normal or higher, and triglyceride level ≥1.7 mmol/l (150 mg/dl). The BARD score [20] is a weighted sum of three easily available variables [BMI ≥28 kg/m2 (1 point), AAR ≥0.8 (2 points), and DM (1 point)]. Modified scores (with cutoff values of BMI changed to 25 kg/m2) of Palekar’s, BAAT, and BARD were also calculated. The NAFLD fibrosis score (NFS) [21] was calculated according to the following formula: −1.675 + 0.037 × age (years) + 0.094 × BMI + 1.13 × impaired fasting glycemia (IFG)/DM (yes = 1, no = 0) + 0.99 × AAR − 0.013 × platelet (×109/l) − 0.66 × albumin (g/dl). The N (Nippon) score [22] was calculated as the total number of the following risk factors: female sex, older age (>60 years), type 2 DM (T2DM), and hypertension.

Histological evaluation

All patients enrolled in this study underwent a percutaneous liver biopsy under ultrasonic guidance. The liver specimens were embedded in paraffin and stained with hematoxylin and eosin, Masson-trichrome, and reticulin silver stain. Two hepatopathologists (T.O. and Y.S.) who were blinded to the clinical data reviewed the liver biopsy specimens. Adequate liver biopsy samples were defined as >1.5 cm long and/or having more than six portal tracts. NASH was defined as steatosis with lobular inflammation and ballooning degeneration, with or without Mallory-Denk body or fibrosis [2, 3]. Patients whose liver biopsy specimens showed simple steatosis or steatosis with nonspecific inflammation were identified as the nonNASH cohort [2, 3]. The presence or absence of hepatocyte ballooning degeneration is influenced by the variability in pathologists’ interpretation. The NAFLD Activity Score (NAS) proposed by Kleiner et al. [12] was the unweighted sum of the scores for steatosis (0–3), lobular inflammation (0–3), and ballooning degeneration (0–2). If liver histology was too atypical to make a judgment, cases with an NAS of ≥5 were considered to be NASH. The severity of hepatic fibrosis (stage) was defined as follows: stage 1, zone 3 perisinusoidal fibrosis; stage 2, zone 3 perisinusoidal fibrosis with portal fibrosis; stage 3, zone 3 perisinusoidal fibrosis and portal fibrosis with bridging fibrosis; and stage 4, cirrhosis [23].

Statistical analysis

Results are presented as the means and standard deviation (SD) for quantitative data, or as numbers with percentages in parentheses for qualitative data. Statistical differences in quantitative data were determined using the t test. Fisher’s exact probability test or χ 2 analysis was used for qualitative data. Multivariate analysis was performed by logistic regression analysis to identify variables independently associated with the presence of NASH. Those variables with P < 0.05 by multivariate analysis were used to construct a scoring system to predict NASH. The scoring system was a weighted sum of significant variables on the basis of odds ratio (OR) obtained from logistic regression analysis. To assess the accuracy of the clinical scoring system in differentiating NASH from NAFLD, we calculated the sensitivity (Se) and specificity (Sp) for each value of each test, and then constructed receiver operating characteristic (ROC) curves by plotting the Se against (1 − Sp) at each value. The diagnostic performance of the scoring systems was assessed by analysis of ROC curves. The most commonly used index of accuracy was the area under the ROC curve (AUROC), with values close to 1.0 indicating high diagnostic accuracy. To evaluate the overall accuracy of our score and NFS in detecting significant or advanced fibrosis, the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated. Differences were considered statistically significant at P < 0.05.

Results

Characteristics of the patient population in the estimation group

Table 1 summarizes the clinical, laboratory, and liver biopsy data of the patient population in the estimation group. Eighty-six (49%) patients were female, and 120 (68%) were obese. Of 177 NAFLD patients involved in this estimation group, 98 (55%) were histologically diagnosed with NASH, and 79 (45%) had nonNASH NAFLD. NASH patients were significantly older, predominantly female, heavier, hypertensive, and more likely to have T2DM; had lower hemoglobin (Hb), platelet count and ChE; and had higher levels of AST, ALT, AAR, ferritin, FPG, IRI, HA, and type IV collagen 7S.

Table 1 Clinical characteristics of patients in the estimation and validation groups

Predictors of NASH

Table 2 shows the univariate comparison and the results of the multivariate analysis performed in the 177 patients in the estimation group. Univariate analysis showed that age (≥55 years), female sex, obesity, Hb ≤14.5 g/dl, presence of T2DM, platelet count ≤22 × 104/μl, AST ≥60 IU/l, ALT ≥90 IU/l, AAR ≥0.8, ferritin ≥200 ng/ml (female) or ≥300 ng/ml (male), FPG ≥100 mg/dl, IRI ≥10.0 μU/ml, HA ≥50 ng/ml, and type IV collagen 7S ≥5.0 ng/ml were significant variables. By multivariate analysis, three variables remained significant, including ferritin, IRI, and type IV collagen 7S. Thus, these three variables, ferritin ≥200 ng/ml (female) or ≥300 ng/ml (male), IRI ≥10.0 μU/ml, and type IV collagen 7S ≥5.0 ng/ml, were combined to form the NAFIC score. The score was weighted based on OR obtained from logistic regression analysis (Table 2). Ferritin was given 1 point, IRI 1 point, and type IV collagen 7S 2 points. A score ranging from 0 to 4, defined by the presence of laboratory parameters, was calculated. The score for NASH patients (n = 98, 2.36 ± 1.28) was significantly higher than that for nonNASH patients (n = 79, 0.66 ± 0.82, P < 0.0001). The percentage of NASH in NAFLD with an NAFIC score of 0, 1, 2, 3, and 4 was 14% (6/44), 44% (34/61), 74% (14/19), 100% (28/28), and 92% (23/25), respectively (Fig. 1). The score was significantly higher even in NASH patients without significant fibrosis (stage 0 or 1) (n = 47, 1.83 ± 1.15) than in nonNASH patients (P < 0.0001). Figure 2 shows the ROC curve for NAFIC score to differentiate NASH from NAFLD. This scoring system had an AUROC of 0.851. At a cutoff value of NAFIC score 2, the sensitivity, specificity, PPV, and NPV were 66, 91, 90, and 67%, respectively. At a cutoff value of NAFIC score 1, the sensitivity, specificity, PPV, and NPV were 94, 48, 31, and 86%, respectively.

Table 2 Results of univariate and multivariate analysis: independent predictors of NASH and assigned score values in the estimation group (n = 177)
Fig. 1
figure 1

Distribution of NAFIC scores in patients with NASH and nonNASH in the estimation group (n = 177)

Fig. 2
figure 2

ROC curves for the NAFIC score in the estimation (a) and validation (b) groups

Validation results

The diagnostic accuracy of the scoring system in separating patients with and without NASH was validated in 442 patients. Table 1 summarizes the clinical, laboratory, and liver biopsy data of the patient population in the validation group. Two hundred and two patients (46%) were female, and 326 (74%) patients were obese. Patients with NASH were significantly older, predominantly female, heavier, hypertensive, and more likely to have T2DM; had lower platelet count and ChE level; and had higher levels of AST, ALT, AAR, ferritin, IRI, HA, and type IV collagen 7S, than those with nonNASH NAFLD. The NAFIC score of NASH patients (n = 244, 2.03 ± 1.27) was significantly higher than that of nonNASH patients (n = 198, 0.76 ± 0.87, p < 0.0001). The percentage of NASH in NAFLD with an NAFIC score of 0, 1, 2, 3, and 4 was 25% (29/115), 44% (67/153), 76% (55/72), 92% (54/59), and 91% (39/43), respectively (Fig. 3). The score was significantly higher even in NASH patients without significant fibrosis (stage 0 or 1) (n = 143, 1.57 ± 1.17) than nonNASH patients (p < 0.0001). The AUROC remained relatively high in the validation set (0.782, Fig. 2; Table 3). At a cutoff value of NAFIC score 2, the sensitivity, specificity, PPV, and NPV were 60, 87, 85 and 64%, respectively. At a cutoff value of NAFIC score 1, the sensitivity, specificity, PPV, and NPV were 88, 43, 66, and 75%, respectively.

Fig. 3
figure 3

Distribution of NAFIC score in patients with NASH and nonNASH in the validation group (n = 442)

Table 3 AUROC of NAFIC score and various scoring systems previously reported in the estimation and validation groups

Comparing the NAFIC score to several previously established scoring systems

The AUROCs of various scoring systems that have been reported to differentiate NASH from NAFLD, or advanced from mild fibrosis, are shown in Table 3. To differentiate NASH from NAFLD, the AUROC in the validation group was greatest for NAFIC (0.782, Fig. 2), then Gholam’s score (0.758), followed by Palekar’s score (0.711), modified Palekar’s score (0.709), modified BAAT (0.666), NFS (0.663), N score (0.642), HAIR (0.636), BAAT (0.633), BARD score (0.621), and modified BARD score (0.603). Based on an evaluation of AUROC, NAFIC score outperformed other scoring systems in the estimation and validation groups to differentiate NASH from NAFLD. To differentiate NASH with significant fibrosis from NAFLD, the AUROC in the total cohort was greatest for NAFIC score (0.834), then NFS (0.817), followed by modified Palekar’s score (0.808), Palekar’s score (0.799), Gholam’s score (0.786), N score (0.720), modified BARD (0.695), BARD (0.688), modified BAAT score (0.641), HAIR (0.593), and BAAT score (0.585). To differentiate NASH with advanced fibrosis from NAFLD, the AUROC in the total cohort was greatest for NAFIC score (0.869), then NFS (0.853), followed by modified Palekar’s score (0.837), Palekar’s score (0.835), BARD (0.734), modified BARD (0.730), Gholam’s score (0.729), N score (0.704), modified BAAT (0.573), HAIR (0.566), and BAAT score (0.526). Among these prediction models, NAFIC score was superior to others in not only detecting NASH, but also predicting fibrosis stage.

We compared the diagnostic accuracy of NAFIC score to that of NFS in detecting advanced fibrosis (stage 3–4) and significant fibrosis (stage 2–4) (Table 4). To exclude advanced fibrosis using the low cutoff point (NFS <−1.455), 325 of 330 (98%) patients were correctly staged, whereas only 5 (2%) were understaged. The NPV of this cutoff for advanced fibrosis was 98%. Using the low cutoff point (NAFIC ≤1), 371 of 374 (99%) patients were correctly staged, whereas only 3 (1%) were understaged. The NPV of this cutoff for advanced fibrosis was 99%, which was equal to that of NFS. Using the high cutoff point (NFS >0.676), 21 of 49 (43%) patients were correctly staged, whereas 28 (57%) were overstaged. The PPV of this cutoff for advanced fibrosis was 43%. Using the high cutoff point (NAFIC ≥3), 56 of 155 (36%) patients were correctly staged, whereas 99 (64%) were overstaged. The PPV of this cutoff for advanced fibrosis was 36%, which was lower than that of NFS. The percentage of the undetermined range was much lower for the NAFIC score (15%) than for NFS (36%) (Table 4).

Table 4 Accuracy of the NAFIC score and NAFLD fibrosis score (NFS) in predicting advanced fibrosis (stage 3–4) and significant fibrosis (stage 2–4) in the total cohort

When the NFS low cutoff (NFS <−1.455) was applied to predict significant fibrosis, 305 of 330 (92%) patients were correctly staged, whereas 25 of 330 (8%) patients were understaged. The NPV for significant fibrosis was 92%. When their high cutoff (NFS >0.676) was applied to predict significant fibrosis, 33 of 49 (67%) patients were correctly staged, whereas 16 of 49 (33%) patients were overstaged. The PPV for significant fibrosis was 67%. When our low cutoff point (NAFIC = 0) was applied to exclude significant fibrosis, 153 of 160 (96%) patients were correctly staged, whereas only seven (4%) were understaged. The NPV of this cutoff for significant fibrosis was 96%, which was slightly higher than NFS. Using our high cutoff point (NAFIC ≥2), 127 of 245 (52%) patients were correctly staged, whereas 118 (48%) were overstaged. The PPV of this cutoff for significant fibrosis was 52%, which was lower than that of NFS (67%) (Table 4).

Discussion

In this study, we developed and validated a simple scoring system to differentiate NASH from NAFLD. Our scoring system with the three variables ferritin, IRI, and type IV collagen 7S had an AUROC of 0.851 and 0.782 in the estimation and validation groups, respectively. Elevation of serum ferritin levels, a marker of iron storage, is associated with NASH [24, 25]. We previously reported high frequencies of hyperferritinemia and increased hepatic iron stores in Japanese NASH patients [11]. Yoneda and colleagues [26], our collaborative research group, also have reported that measurement of serum ferritin is useful to distinguish NASH from NAFLD. Their optimal cutoff value was 196 ng/ml, and their results for sensitivity, specificity, PPV, and NPV were 64, 77, 89, and 43%, respectively. Serum ferritin levels have been found to be a significant independent predictor of severe fibrosis in 167 Italian NAFLD subjects [27], but this has not been confirmed by other studies [28]. In Western countries, mildly increased serum ferritin does not necessarily indicate coexisting iron overload. However, it is well known that serum ferritin is closely associated with IR and can be considered a marker for metabolic syndrome [29].

Hyperinsulinemia (IRI ≥10.0 μU/ml) was also selected as an independent predictor of NASH. Hyperinsulinemia and increased IR could have important roles in the pathogenesis of NASH in both Western and Asian countries [3033]. Hyperinsulinemia in NASH patients is attributable to increased insulin secretion, which compensates for reduced insulin sensitivity, and is not the consequence of decreased hepatic extraction of insulin, which occurs in all forms of CLD at the stage of advanced fibrosis or cirrhosis [30, 31]. The homeostasis model assessment (HOMA) has been validated and widely used for determining the degree of IR, and strongly predicts the development of T2DM [34]. Patients with NASH have a higher HOMA index compared with those with nonNASH NAFLD [30, 35]. Similarly, another study has reported the QUICKI model as being useful for predicting NASH [15]. However, appropriate cutoff values of these models have never been established. In contrast with these parameters that are mathematical transformations of FPG and IRI levels, fasting IRI, which has the advantage of being easily determined without calculations, was only applied to multivariate analysis in our study.

Type IV collagen is one of the extracellular matrices that are produced by hepatic fibroblasts. The 7S domain in the N-terminus of type IV collagen is inserted in tissues and released into the blood by turnover in connective tissues. Therefore, the serum 7S domain level increases in parallel with the amount of fibrosis and in synthesis from stellate cells and myofibroblasts following increased liver fibrosis [36]. In Japan, type IV collagen 7S is now widely used for assessing the extent of hepatic fibrosis in CLD because the test is covered by public health insurance. According to two reports [37, 38], a cutoff point of 5.0 or 4.25 ng/ml provided high NPV to exclude advanced fibrosis in Japanese NAFLD patients. Shimada et al. [39] have demonstrated that a cutoff point of 5.0 ng/ml provided sensitivity and specificity of 41 and 95%, respectively, to detect early-stage NASH. Serum HA levels are elevated during accelerated deposition of collagen in the extracellular space due to upregulation of HA production by activated stellate cells and myofibroblasts, and downregulation of its clearance by sinusoidal endothelial cells. Serum HA appears to be a relatively accurate predictor of advanced fibrosis stage in NAFLD, but less for distinguishing between minor degrees of fibrosis in NASH and nonNASH NAFLD [40, 41]. In our study, serum HA level was not an independent predictor of NASH by multivariate analysis. Moreover, HA increases in systematic inflammatory conditions, which might produce false-positive results. We believe that type IV collagen 7S is superior to HA in predicting the extent of fibrosis in NAFLD patients.

Currently, the NAFLD biomarkers have been evaluated for (1) distinguishing NASH from NAFLD and/or (2) diagnosing advanced fibrosis or cirrhosis. HAIR [15], Palekar’s [17], and Gholam’s [18] scores were derived for distinguishing NASH from NAFLD, and the others are for detecting significant or advanced fibrosis. The present study clearly demonstrated that NAFIC score was more useful than other scoring systems for detection of NASH and for prediction of fibrosis (Table 3). The HAIR score uses a combination of presence of hypertension, elevated ALT (>40 U/L), and IR, with at least two parameters that indicate NASH with high sensitivity and specificity [15]. None had an IR index >5 in our overall cohort (data not shown), and AUROCs were 0.632 and 0.634 in the estimation and validation groups, respectively. The score does not appear useful in a less obese population because it was described in a group of severely obese patients who were undergoing gastric bypass surgery. In Palekar’s study [17], the presence of at least three out of six factors (age ≥50 years, female sex, AST ≥45 IU/l, BMI ≥30 kg/m2, AAR ≥0.80, and HA ≥55 ng/ml) had a sensitivity and specificity for NASH diagnosis of 74 and 66%, respectively. In our estimation group, the presence of at least three of these factors had a sensitivity and specificity for NASH diagnosis of 68 and 71%, respectively. In our validation group, the presence of at least three of these factors had a sensitivity and specificity for NASH diagnosis of 63 and 64%, respectively. AUROCs were 0.784 and 0.711 in the estimation and validation groups, respectively. Therefore, Palekar’s score was not superior to NAFIC score for predicting NASH (Table 3).

Gholam’s score [18], which consists of only two variables (AST and the presence of DM), is very simple and equally useful as NAFIC score for detection of NASH, but it was not superior to our score for predicting severe fibrosis in our cohort. Gholam et al. [18] have constructed other models that consist of ALT and HbA1c to detect the presence of fibrosis. We could not evaluate these models because HbA1c was only measured in a limited number of patients. Angulo et al. [21] have shown that the NFS, which consists of six variables (age, BMI, AAR, IFG/DM, platelet count, and albumin), can reliably predict advanced fibrosis. In ROC analysis, NFS is shown to be useful for prediction of advanced or significant fibrosis. The low cutoff point (NFS <−1.455) showed higher NPV (98%) than that in the estimation (93%) and validation (88%) cohort reported by Angulo et al. The low cutoff point in NAFIC score (≤1) had equally high NPV (99%). In contrast, the high cutoff point (NFS >0.676) showed lower PPV (43%) than that in the estimation (98%) and validation (80%) cohort reported by Angulo et al. The high cutoff point in NAFIC score (≥3) had lower PPV (36%). By applying the low cutoff score (NFS <−1.455, NAFIC ≤1), advanced fibrosis could be excluded with high accuracy. By applying the high cutoff score (NFS >0.676, NAFIC ≥3), the presence of advanced fibrosis could not be diagnosed with high accuracy. Consistent with our results, a separate validation study of NFS in 162 Chinese patients found that the NPV for excluding advanced fibrosis was 91%, but the PPV for predicting advanced fibrosis was 0% [42]. It is suggested that this low PPV might be due to lower prevalence of advanced fibrosis in the study of Wong et al. (11%) [42] than in that by Angulo et al. (27%) [21]. Similarly, the prevalence of advanced fibrosis was low (11%) in our study.

In Asian patients, steatohepatitis and other metabolic complications tend to develop at a lower BMI, which is one of the factors in the equation of the NFS. Therefore, NFS and NAFIC score were applicable to exclusion rather than detection of significant or advanced fibrosis. NFS can be easily obtained in clinical practice, but this scoring system can be cumbersome and difficult to apply in every practice. The easily determined NAFIC score is at least equivalent to the more complex NFS. Our results suggest that liver biopsies can be avoided in NAFLD patients with a NAFIC score of 0 or 1 because they are likely to have NAFLD without advanced fibrosis. In contrast, liver biopsies should be recommended in NAFLD patients with an NAFIC score of ≥2 to assess the extent of hepatic fibrosis and predict prognosis. The BARD score developed by Harrison et al. is a weighted sum of three easily available variables [BMI ≥28 kg/m2 (1 point), AAR ≥0.8 (2 points), and DM (1 point)], and the authors have shown that a score of 2–4 was associated with an OR of 17 for predicting advanced fibrosis [20]. Although the BARD score is simple to calculate, our validation study did not reveal an advantage of this score over others. In our cohort, when a BARD score of ≥2 was found, the sensitivity, specificity, PPV, and NPV for detecting advanced fibrosis were 73, 65, 19, and 95%, respectively. According to a study of 122 Japanese NAFLD patients by Fujii and colleagues [43], our collaborative research group, when a BARD score of ≥2 was used, the AUROC was 0.73 with an OR of 4.9 for detection of advanced fibrosis. It has been concluded that BARD score is less predictive of advanced fibrosis in Japanese NAFLD patients because they are not as obese as those in Western countries. Disappointingly, modified scores of Palekar’s score, BAAT, and BARD could not improve the diagnostic accuracy for NASH or advanced fibrosis. The N score (the total number of the following risk factors: female sex, age >60 years, T2DM, and hypertension), which was established on the basis of data collection from 182 Japanese NAFLD patients in multiple centers in Nagasaki [22], is very simple, without the need for detailed laboratory tests. However, it was not superior to other scoring systems in our validation study.

Our study had several limitations. The fact that we excluded diabetic patients treated with exogenous insulin or insulin sensitizers (metformin or pioglitazone) from the analysis was a major limitation. In the future, we must find better scoring systems that are applicable to these patients. Other limitations include the largely retrospective study design and lack of complete data in many subjects. We included patients from different hepatology centers in Japan that have a particular interest in studying NAFLD, and thus, some referral bias could not be ruled out. Patient selection bias could also have existed because liver biopsy might have been considered for NAFLD patients who were likely to have NASH. We acknowledge that pathological diagnosis was mainly determined using liver tissues derived from percutaneous liver biopsy, which is prone to sampling error or interobserver variability [8, 9]. In fact, 11 patients of our total cohort were diagnosed as nonNASH in spite of an NAFIC score of 3 or 4. Although the exact reason was unknown, sampling error could have led to this misdiagnosis. These patients need follow-up care or repeat liver biopsies. It should be emphasized that we had a central pathology review by two hepatopathologists to prevent interobserver variability, although we were not able to quantify the effect on our results of some intraobserver variability. Because all participants were Japanese, there is a possibility that our results might not be adaptable for NAFLD patients of other races. Due to these limitations, the present results need to be validated in independent populations by other investigators.

In conclusion, NAFIC score can predict NASH in Japanese NAFLD patients with sufficient accuracy and simplicity to be considered for clinical use, thus identifying a very high-risk group in whom liver biopsy would be very likely to detect NASH, as well as a low-risk group in whom liver biopsy can be safely delayed or avoided.