Introduction

Nonalcoholic steatohepatitis (NASH) is the progressive form of nonalcoholic fatty liver disease (NAFLD), defined as the presence of ≥ 5% hepatic steatosis with inflammation and ballooning of hepatocytes, regardless of hepatic fibrosis [1]. NASH account for 20–25% of NAFLD patients [2], and its diagnosis is important owing to its poor prognosis compared with nonalcoholic fatty liver (NAFL). As NAFLD affects approximately 30% of the global population [3], it is difficult to perform liver biopsy in whole NAFLD patients. Besides, liver biopsy carries several limitations such as high cost, high risk of bleeding and infection, and longer time to obtain results [4]. Therefore, development of a non-invasive method for the evaluation of NAFLD severity is emerging.

Non-invasive biomarkers have been studied, including serologic biomarkers, combination panels, and imaging biomarkers in the diagnosis of NASH [5, 6]. Cytokeratin 18 (CK18) is one of the most investigated markers for NAFLD severity [7]. However, its intermediate accuracy and uncertain optimal cut-off value limit its clinical use. Transient elastography (TE) and magnetic resonance elastography (MRE), which can measure liver stiffness, exhibited a modest accuracy for the diagnosis of NASH [8]. NASH is a highly complex disease, and it is difficult to develop a non-invasive biomarker for it with a single parameter. Combined panels, including NASH Test and NASH Diagnostics Panel were developed, but exhibited poor accuracy and were expensive [9]. The FibroScan-AST (FAST) score calculated using the controlled attenuation parameter (CAP), liver stiffness measurement, and aspartate transaminase (AST) had been developed as a prediction model to identify patients with significant activity and fibrosis [10]. Although the FAST score showed satisfactory performance (c-statistic, 0.80; 95% CI 0.76–0.85) and was validated with another external cohort (c-statistic range 0.74–0.95), this scoring system is not intended for diagnosing NASH.

Multiparametric magnetic resonance (MR) is a valuable modality for evaluating the severity of NAFLD [11]. Magnetic resonance imaging (MRI), proton density fat fraction (PDFF), and MRE exhibited superior performances than other non-invasive modalities in detecting steatosis and fibrosis and grading their severity [12]. We previously reported that the non-invasive MR index, comprising magnetic resonance spectroscopy (MRS), MRE, and T1 relaxation time, can effectively diagnose NASH [13]. Moreover, MRI can effectively detect pathologic lesions in the liver, whereas ultrasonography does not facilitate the visualization of lesions in a fatty liver. Thus, multiparametric MR could be a useful replacement or supplement of liver biopsy.

Although the use of multiparametric MR for evaluating NASH has been reported previously, we aimed to develop a diagnostic scoring system that could improve the accuracy, sensitivity, and specificity of diagnosis of NASH by combining MR parameter and clinical indicators.

Patients and methods

Study population

To develop a scoring system, we included NAFLD patients from a biopsy-confirmed NAFLD cohort from Korea University Guro hospital [8]. All the patients underwent liver biopsy when they were suspected NAFLD in sonography, needed exclusion of other liver diseases, or required accurate assessment of disease severity of NAFLD. All patients had no other chronic liver disease, such as chronic hepatitis B or C infection, or autoimmune liver disease. Alcohol abusers, defined as men and women who consumed more than 140 g and 70 g of alcohol per week, respectively, patients with decompensated liver cirrhosis, those with contraindications to MRI, and those with other severe systemic disease or malignancy were excluded. All patients underwent laboratory tests and MRI within 6 months after liver biopsy. Diabetes was diagnosed based on previous medical history and diagnostic criteria [14]. Impaired fasting glucose (IFG) was defined as a fasting plasma glucose level between 100 and 125 mg/dL. Laboratory tests included white blood cell count, platelet count, hemoglobin, aspartate aminotransferase (AST), alanine aminotransferase, alkaline phosphatase, gamma-glutamyl transferase, total bilirubin, albumin, prothrombin time, blood urea nitrogen, creatinine, and c-reactive protein.

This study was approved by the institutional review board of the Korea University Guro Hospital (2016GR0302). All patients who agreed to participate in the study provided written informed consent. This study was done according to the Declaration of Helsinki.

Histopathologic evaluation

Liver biopsy was performed by a skilled radiologist via the intercostal space with an 18-gauge Tru-cut needle (TSK Laboratory, Tochigi, Japan). Two liver specimens were fixed in formalin and paraffin block was made. The sliced sections were stained using hematoxylin and eosin. NAFLD was diagnosed by two experienced pathologists, and its severity was evaluated according to the NASH Clinical Research Network histologic score [15]. NASH was diagnosed when hepatic steatosis was ≥ 5%, along with inflammation and ballooning of hepatocytes, regardless of fibrosis [1].

Multiparametric magnetic resonance imaging

All patients underwent MRI using a 3 T MR scanner (MAGNETOM Skyra; Siemens Healthcare, Erlangen, Germany). Multiparametric MR sequences consisted of seven sequences, namely MRI-PDFF, MRS, T1 relaxation time, MRE, T1-weighted image, T2-weighted image, and diffusion-weighted image. The image parameters are summarized in Supplementary Table 1. Modified Dixon techniques were used to measure the MRI-PDFF [16]. MRS data was obtained from a single voxel measuring 20 × 20 × 20 mm, and data were analyzed using an online program from the MR scanner vendor, described in our previous study [13]. T1 relaxation time was measured using a shortened Modified Look Locker Inversion recovery sequence based on fast low-angle shot (FLASH) [17]. Images were acquired at three different levels, namely the point where the hepatic veins join the inferior vena cava, hilum of the liver, and gallbladder fossa. T1 relaxation time was measured by applying a non-selective inversion recovery pulse and low-flip angle FLASH acquisitions for 16 inversion contrasts. Three regions of interest (ROIs) from different images were measured and their mean values were represented as T1 relaxation time (milliseconds). Liver stiffness was measured by MRE using a pneumatic driver system (Resoundant, Inc., Rochester, MN, USA) attached to the right anterior chest wall and the liver of the patients. A 60-Hz shear wave was generated from the driver and delivered through a flexible vinyl tube. Four MRE images were acquired during the expiratory phase of respiration. The acquired images were processed by elastograms, and the liver stiffness was measured using ROIs drawn on the elastograms. Four ROIs from different images were measured, and the mean values of MRE-liver stiffness measurement (MRE-LSM) were represented in kilopascals (kPa).

Score development

The score was developed using demographics, laboratory data, and MRI parameters of the 127 enrolled patients. The selection of parameters was based on the − 2 log likelihood test statistic. The parameters which have ΔAIC and ΔBIC with negative value were selected. The selected parameters were used to build a second-order multiple logistic regression model. The statistical significance of each predictor was also evaluated using the − 2 log likelihood test statistic, which is − 2 times the difference of log likelihood values between logistic regression models with and without the predictor when the other predictors are adjusted. The internal validation of a developed model was performed using 1,000 bootstrap samples to the performance of the model and tenfold cross validation. The performance was assessed in terms of c-statistic.

Statistics

The patients’ demographic and laboratory characteristics are summarized as numbers with percentages for categorical variables or medians with interquartile ranges for numerical variables. Pearson’s chi-squared test and Mann–Whitney U test were used to compare the baseline characteristics between patients with NAFL and NASH. Multiple logistic regression analysis was performed to develop a prediction model for the diagnosis of NASH. The model was built using SAS software version 9.4 (SAS Institute, Cary, NC, USA), and the bootstrapping for the internal validation of the developed model was conducted using R language version 3.6.2 (R Foundation for Statistical Computing, Vienna, Austria). p values < 0.05 were considered statistically significant, except for a variable selection that used a value of 0.1.

Results

Baseline characteristics

In total, 127 patients diagnosed with NAFLD were enrolled at the Korea University Guro Hospital between September 2016 and March 2019. The demographic and laboratory characteristics of patients with NAFL and NASH are summarized in Table 1. Patients with NASH were older (p < 0.001) and showed a lower body mass index (BMI) (p = 0.015) compared with NAFL patients. The prevalence of diabetes/IFG, hypertension, and dyslipidemia was higher in patients with NASH than in patients with NAFL, but diabetes/IFG was only significant (p = 0.002). Compared with patients with NAFL, those with NASH had a lower hemoglobin (p = 0.011) and platelet count (p < 0.001) and a higher AST level (p < 0.001). Among multiparametric MR sequences, MRE-liver stiffness measurement (MRE-LSM) was higher in NASH patients comparing with NAFL patients (p < 0.001). The histopathologic findings are summarized in Supplementary Table 2. Steatosis did not significantly differ between patients with NAFL and NASH, whereas more patients with NASH exhibited higher grade inflammation and more advanced stage of fibrosis than those with NAFL. Representative histopathological features are shown in Supplementary Fig. 1.

Table 1 Baseline characteristics

Data inspection

No evidence of multicollinearity was observed in terms of a variable inflation factor among the variables used to develop the prediction model (Supplementary Table 3). No influential observation was identified through regression model diagnostics (data not shown).

Model building

We summarized flow chart for developing MRE-based NASH scoring system in Supplementary Fig. 2. Four categorical variables and 19 continuous variables were examined for the NASH diagnostic model (Supplementary Table 4). Total six parameters, comprising three clinical (age, BMI, and diabetes/IFG) and two laboratory parameters (hemoglobin and platelet count), and MRE-LSM were chosen as candidate parameters for the diagnostic model. A model consisting of three clinical and two laboratory parameters provided a c-statistic of 0.784 (95% CI 0.702–0.852). When the MRE-LSM parameters were added to the model, the c-statistic increased to 0.841 (95% CI 0.772–0.910), which was statistically significant (p = 0.049 by DeLong test) (Supplementary Fig. 3). This implies that MR parameters significantly increased the diagnostic accuracy of the model. There was no significant effect of interactions among the six selected parameters for the diagnosis of NASH (Supplementary Table 5).

The final MRE-based NASH score prediction model is as follows;

$$\begin{aligned} {\text{Logit[}}({\text{NASH}} = {1}){]} & = {\text{log}}[P({\text{NASH}} = {1})/\{ {1} - P({\text{NASH}} = {1})\} ] = {5}.{37}0 - 0.0{16} \times ({\text{age}}) \\ & \quad - 0.{118} \times ({\text{BMI}}) + 0.{644} \times ({\text{diabetes}}/{\text{IFG}};{\text{ yes}} = {1},{\text{ no}} = 0) \\ & \quad - 0.{215} \times ({\text{HB}}) - 0.0{11} \times ({\text{PLT}}) + {1}.0{98} \times ({\text{MRE}}). \\ \end{aligned}$$

These six parameters that were used in MRE-based NASH score were significantly correlated with ballooning but not inflammation (Supplementary table 6).

Diagnostic accuracy

The MRE-based NASH score showed a satisfactory diagnostic accuracy for diagnosis of NASH (c-statistic, 0.841; 95% CI 0.772–0.910) (Fig. 1). To establish an exclusion cut-off value for a sensitivity greater than 0.9, while compensating for a low specificity and ensuring a diagnostic cut-off value for specificity larger than 0.9, we suggested a cut-off value of 0.37 for exclusion of NASH (sensitivity, 0.91; specificity, 0.55; negative predictive value 0.84) and a predicted score of 0.68 for diagnosis of NASH (sensitivity, 0.57; specificity, 0.91; positive predictive value 0.89). We found that 35% of patients (44 out of 127 patients) remained in the so-called “gray zone” between 0.37 and 0.68 (Table 2). The overall sensitivity, specificity, positive predictive value, and negative predictive value are shown in Fig. 2.

Fig. 1
figure 1

Diagnostic accuracy of MRE-based NASH score for NASH among patients with NAFLD. ROC curve for MRE-based NASH score. NASH, nonalcoholic steatohepatitis; NAFLD, nonalcoholic fatty liver disease; ROC, receiver operating characteristic; MRE, magnetic resonance elastography

Table 2 Diagnostic accuracy of MRE-based NASH score for diagnosis of NASH
Fig. 2
figure 2

Sensitivity, specificity, positive predictive value, and negative predictive value of MRE-based NASH score, and selection of cut-off values for exclusion and diagnosis of NASH. NASH, nonalcoholic steatohepatitis; MRE, magnetic resonance elastography

We compared c-statistics between the MRE-LSM and the TE-LSM, while adjusting the other five clinical and laboratory parameters (age, BMI, diabetes/IFG, hemoglobin, and platelet count). Although the difference was not significant (p = 0.2054 by Delong test), the AUC of the MRE-LSM (c-statistics = 0.841) was higher than that of the TE-LSM (c-statistics = 0.809).

Internal validation of MRE-based NASH Score using bootstrapping

Our MRE-based NASH score model was internally validated through bootstrapping and tenfold cross validation. Based on 1000 bootstrap samples, the optimism-corrected c-statistic, 0.811 was obtained. Also, through tenfold cross validation the optimism-corrected c-statistic, 0.821 was obtained. The Brier score, another measure for the accuracy of probabilistic predictions that ranges between 0 and 1, was 0.163. Lower Brier scores indicated better calibrated predictions. Meanwhile, the discrimination slope, defined as the slope of a linear regression of predicted probabilities of events derived from a prognostic model on the binary event status, was 0.295. Overall, the internal validity of the MRE-based NASH score model was satisfactory.

Discussion

As NASH is a progressive form of NAFLD and liver biopsy is essential for its diagnosis [1], non-invasive biomarkers are urgently required. In the present study, we developed the MRE-based NASH score by combining age, diabetes/IFG, BMI, hemoglobin, platelet count, and MRE-LSM.

The diagnosis of NASH is important, as it has a poor prognosis compared with NAFL and as only patients with NASH are indicated for drug therapy [3]. Although liver biopsy is required to diagnose NASH, it carries limitations such as complication risks, high cost, and inconvenience [18]. To overcome these limitations, non-invasive diagnostic biomarkers for NASH have been developed. Early NASH biomarkers were single markers targeting NASH disease pathways such as apoptosis, inflammation, and oxidative stress [5]. However, most of them could not be applied in the clinical setting because of their unsatisfactory accuracy. As complex pathologic processes are involved in the progression of NASH, biomarkers with single parameters exhibit limited efficacy in discriminating NASH from NAFL. Therefore, the NASH test, NASH diagnostic panel, and a combination of CK-18 and surface antigen Fas has been developed as a combined biomarker [9]. Although these combined panels showed an elevated accuracy, they included parameters that are not routinely checked including CK-18, apolipoprotein, adiponectin, and resistin. Except MRE, the MRE-based NASH score consists of easily accessible demographic parameters including age, status of diabetes/IFG, BMI, and easy to measure laboratory parameters, such as hemoglobin and platelet count. Moreover, the MRE-based NASH score showed a high accuracy of 0.841.

Diabetes/IFG are metabolic diseases related to the development and progression of NAFLD [19]. Diabetes and a family history of diabetes are significantly associated with NASH [20]. In diabetes, insulin resistance and adipose tissue dysfunction induce lipotoxicity in hepatocytes and activate the pro-inflammatory pathway [21]. Glucotoxicity in diabetes is also associated with lipotoxicity and insulin resistance promotes NASH [21]. Diabetes/IFG showed a positive correlation with the MRE-based NASH score. Increasing age was also associated with NASH in this scoring system. Older patients with NAFLD showed more severe histologic changes [22] and cellular senescence was correlated with hepatic steatosis and the severity of NAFLD [23]. Meanwhile, BMI was inversely correlated with NASH in the MRE-based NASH score. Obesity and higher BMI increase the risk of NAFLD and NASH [24]. However, Hagström et.al. reported that lean patients with NAFLD have a higher risk of severe liver disease than NALFD patients with higher BMI [25]. Another study reported that lean patients with NAFLD presented a poor clinical course with a higher overall mortality than did overweight or obese patients with NAFLD [26]. Sarcopenia, a significant risk factor for NASH, is a cause of lean body mass [27] and could be the reason for the negative correlation between NASH and BMI in this study. Laboratory examination revealed that hemoglobin and platelet count were significantly lower in patients with NASH than in patients with NAFLD. Chronic inflammation is one of the causes of anemia, and patients with NASH showed a lower hemoglobin level than that of patients with NAFLD [28]. Platelet count is a known biomarker for liver fibrosis in various kinds of liver diseases, and thrombocytopenia is associated with disease severity in NAFLD [29].

The major difference between the MRE-based NASH score and other scoring systems in evaluating the disease severity of NAFLD is the use of MRE-LSM. Multiparametric MR could predict the NAFLD activity score in a mouse NAFLD model [30], and it showed good correlation with inflammation, fibrosis, and ballooning [31]. Our previous study had also reported that the multiparametric MR index showed a good accuracy to diagnose NASH according to the steatosis-activity-fibrosis score [13]. The difference in the results arising from different equipment used can raise a concern regarding validation. Previously, higher technical failure rates were observed at 3 T than at 1.5 T. However, spin-echo echo-planar imaging (SE-EPI) was introduced for 3 T MRE allowing the advancement of techniques; after which no difference was found in the analysis of the magnetic field subgroup (3 T vs. 1.5 T) using the SE-EPI sequence [32]. We used the SE-EPI sequence with a 3 T MRI in this study, which will not lead to any notable problem in the validation. Recent advances in radiomics will probably contribute to the early and non-invasive diagnosis of liver diseases [33, 34]. A recent study found that the radiomics approach could predict liver fibrosis [35]. Further studies applying radiomics to NASH diagnosis would be interesting.

Our MRE-based NASH score focused on the diagnosis of NASH in patients with NAFLD. Meanwhile, the FAST score, composed of AST, CAP, and TE, has been developed to discriminate NASH with a NAS ≥ 4 and fibrosis stage ≥ 2, which is an advanced form of NAFLD, and a potential target for the clinical trial of treatment of NASH [10]. These scoring systems showed a good performance (c-statistic 0.80, 95% CI 0.76–0.85) with satisfactory validation in several external cohorts (c-statistic range, 0.74–0.95). Our previous study also showed that MRE-LSM had a good performance in diagnosing NASH or advanced stage of fibrosis (stage 3 or 4) with good accuracy (AUC 0.86) [8]. Although diagnosing more severe forms of NASH is important, NASH with early fibrosis or without fibrosis could be ignored in this setting. According to the MRE-based NASH score, patients with NASH could be discriminated from those with NAFL. When we compared c-statistics between model with MRE-LSM and model with TE-LSM, model with MRE-LSM showed better AUC than model with TE-LSM (c-statics 0.841 vs. 0809). Moreover, the diagnostic accuracy of our scoring system was not influenced by the fibrosis stage when comparing between the non-advanced and advanced fibrosis groups.

We established diagnostic and exclusion cut-off values to maximize the accuracy of the MRE-based NASH score, granting it a negative predictive value of 0.84 and a positive predictive value of 0.89. Additional diagnostic evaluations, including liver biopsy, are required for patients located in the gray zone (44/127, 35%). Further studies are needed to determine the time interval for follow-up.

This study has several limitations. First, the MRE-based NASH score has not been validated in other cohorts. Although we conducted bootstrapping, an external validation study is essential for evaluating its clinical application. Further external validation from other groups will help generalize the MRE-based NASH score after this study. Second, the MRE-based NASH score included multiparametric MR, which has limited use in primary clinics. Therefore, the MRE-based NASH score would be useful only in tertiary clinics where multiparametric MR is available. Finally, this study included a relatively small number of patients. Further validation studies that include a larger number of patients would help strengthen the accuracy of the MRE-based NASH score.

In conclusion, we developed a novel non-invasive biomarker—the MRE-based NASH score—to diagnose NASH in patients with NAFLD. This scoring system improves the accuracy, sensitivity, and specificity of diagnosis of NASH by combining multiparametric MR and clinical indicators. Further external validation to evaluate its clinical application is warranted.