Introduction

Non-alcoholic fatty liver disease (NAFLD) is defined as evidence of hepatic steatosis, either by imaging or histology, in the absence of secondary causes of fat accumulation within the liver [1•]. NAFLD describes a spectrum of liver disease ranging from steatosis without hepatocyte injury (non-alcoholic fatty liver (NAFL)) to non-alcoholic steatohepatitis (NASH) characterized by steatosis with hepatocellular injury [2•, 3]. NAFLD affects approximately one quarter of adults worldwide, and complications from NAFLD are increasing [4,5,6]. Patients with NASH have an increased risk of liver-related mortality and progression to cirrhosis compared with those with NAFL [7••]. In the absence of effective medical therapies to halt disease progression, it is anticipated that cirrhosis due to NASH will become the leading worldwide cause for liver transplantation in the next decade [8] and is the fastest-growing cause of hepatocellular carcinoma in those patients awaiting liver transplantation [9].

Currently histologic evaluation is used to identify those patients with the highest risk of liver-related complications. Histology also plays a critical role in determining efficacy of new therapies. This review will focus on the histologic diagnosis of NAFL and NASH and key features that influence prognosis. The various histologic scoring systems will be described, and deficiencies in current systems will be highlighted. Finally, the role of histologic evaluation in clinical trials will be discussed with an emphasis on proposed histologic endpoints and implications for clinical trial outcomes.

Histologic Features of NAFLD, NAFL, and NASH

The American Association for the Study of Liver Diseases (AASLD) defines NAFLD as evidence of hepatic steatosis either by imaging or histology in the absence of significant alcohol consumption. On liver biopsy, the presence of ≥ 5% macrovesicular steatosis is the minimal requirement for a diagnosis of NAFLD (Fig. 1a). NASH is defined as ≥ 5% macrovesicular steatosis in addition to hepatocellular injury, which is characterized by ballooning degeneration and lobular inflammation. Hepatic steatosis with or without lobular inflammation is insufficient, and these patients are regarded as having NAFL. Ballooning denegation is the principle mechanism by which liver injury occurs in NASH (Fig. 1b, c). Ballooning degeneration can occur in other forms of liver disease such as in chronic cholestatic injury and drug toxicity (e.g., amiodarone). Thus, ballooning degeneration in the absence of hepatic steatosis is not sufficient for a NASH diagnosis. The one exception to this rule is in the setting of cirrhosis where hepatic steatosis is often scarce (< 5%), but ballooning degeneration may persist. In the appropriate clinical setting, such patients should be regarded as having NASH cirrhosis rather than being labeled as cryptogenic cirrhosis. In adults, NASH fibrosis begins in zone 3 in a perisinusoidal pattern as a result of hepatocellular ballooning degeneration (Fig. 1d) [10, 11••]. Portal and periportal fibrosis subsequently develop followed by bridging fibrosis and cirrhosis (Fig. 1e, f).

Fig. 1
figure 1

Histologic features seen in NAFLD. a Steatosis in fatty liver disease consists of steatotic droplets of varying sizes including large droplet macrovesicular steatosis wherein the lipid droplet fills the entire cell and displaces the nucleus to the periphery and small droplet macrovesicular steatosis characterized by a visible lipid droplet greater than half the size of the nucleus that does not fill the entire cell or displace the nucleus to the periphery (hematoxylin and eosin, ×200). b This liver biopsy meets minimum criteria for a diagnosis of steatohepatitis including steatosis, ballooned hepatocyte (arrowhead), and lobular inflammation (hematoxylin and eosin, ×200). c This liver biopsy demonstrates more severe disease characterized by many ballooning hepatocytes (arrowheads) and prominent lobular inflammation (hematoxylin and eosin, ×200). d Fibrosis in NASH begins around central veins in a perisinusoidal manner (Masson’s trichrome, ×100). e An example of delicate bridging fibrosis with thin fibrous septa (Masson’s trichrome, ×100). f Complex bridging fibrosis with extensive perisinusoidal fibrosis (Masson’s trichrome, ×100). Despite significant differences in the amount of collagen deposition, both e and f would be considered stage 3 fibrosis in current staging systems

Numerous other histologic features are variably present in patients with NAFLD including microvesicular steatosis, Mallory-Denk bodies, glycogenosis and glycogenated nuclei, lipogranulomas, microgranulomas, acidophil bodies, megamitrochondria, and portal inflammation. Pure microvesicular steatosis is unusual and should prompt consideration of alternative etiologies such as drug toxicity or inherited syndromes. Portal inflammation is associated with increasing fibrosis and disease activity in NASH [12,13,14,15], but marked portal inflammation with significant interface activity should raise the possibility of other forms of chronic liver disease.

A subset of patients with NAFLD has evidence of a NASH pattern of fibrosis without evidence of hepatocyte ballooning degeneration. Terms such as borderline steatohepatitis or steatofibrosis have been used to describe these patients. It is likely that this is a heterogenous group of patients with some having inactive or resolving disease and others having active steatohepatitis that was not identified due to biopsy sampling error. The majority of patients may fall into the latter category based a recent study demonstrating that patients with steatofibrosis have a similar prognosis compared with those with fibrosis and active steatohepatitis [16]. Based on current guidance, patients with steatofibrosis are excluded from clinical trials given the lack of definitive evidence of NASH.

Histologic Features Associated with Clinical Outcomes

Histologic features that should be reported or commented upon in NAFLD liver biopsies are those that support the diagnosis and provide prognostic and predictive information. However, only recently has our understanding of the histologic features that are associated with disease progression, and clinical outcomes become clearer. Generally, these studies can be divided into two categories: those that determine features associated with NAFLD progression using paired or serial liver biopsies and those that evaluate the histologic features in a baseline biopsy that correlate with long-term clinical outcomes.

Unfortunately, in the studies using paired liver biopsies, there is limited data on the specific histologic features that are associated with fibrosis progression due to heterogeneity in histologic scoring and reporting of these features [7••, 17,18,19,20,21]. However, one important finding from these studies is that patients with only NAFL on their initial biopsy can develop fibrosis over time. In a meta-analysis by Singh et al. of 411 patients with biopsy-proven NAFLD, NAFL patients had one stage of progression over 14.3 years compared with 7.1 years in patients with NASH [7••]. Importantly, the majority of NAFL patients who had progression of fibrosis had definite NASH on the follow-up biopsy indicating that fibrosis progression is mainly arising in those NAFL patients who converted to active steatohepatitis. These results emphasize the fluid nature of NAFLD with transitions between NAFL and NASH. However, it is important to recognize that these studies are hampered by selection bias in that they describe a retrospective cohort of patients who had paired liver biopsies. The patients with NAFL who underwent a second liver biopsy may represent a high-risk NAFL population (e.g., those with type 2 diabetes) given that repeat biopsies are not routinely indicated in most centers. No study has yet to be published where serial protocol biopsies in a large number of NAFLD patients are evaluated to determine features that predict fibrosis stage progression. For this reason, it is likely that the rate of fibrosis progression in most NAFL patients is lower than what is reported in these studies. Nevertheless, from these paired biopsy studies, it is clear that some patients with NAFL can transition to NASH and develop progressive fibrosis.

A clearer understanding of the histologic risk factors for liver-related morbidity and mortality come from longitudinal studies of NAFLD patients with well-characterized baseline liver biopsies [11••, 21,22,23,24, 25••, 26••, 27•]. The seminal study by Matteoni et al. in 1999 categorized 132 biopsy-proven NAFLD patients into four groups based on the presence of lobular inflammation, ballooning degeneration, Mallory-Denk bodies, and fibrosis [11••]. Over the follow-up period, those patients with only steatosis or steatosis with lobular inflammation had a very small risk of progression to cirrhosis or liver-related mortality. In contrast, the presence of ballooning degeneration with or without fibrosis and Mallory-Denk bodies conferred a much higher risk of liver-related complications. In 2015, Angulo et al. followed 619 patients with biopsy-proven NAFLD for a median follow-up of 12.6 years [26••]. The histologic features of activity associated with liver-related outcomes on univariate analysis include ballooning degeneration, portal inflammation, and NASH diagnostic category. Steatosis and lobular inflammation were not predictive in this study. Other studies have also confirmed the strong association between ballooning degeneration, presence of fibrosis, and adverse liver-related events [10, 11••, 12, 27•]. Portal inflammation has also consistently been shown to be associated with adverse outcomes and fibrosis [13,14,15, 26••, 27•, 28•, 29]. Lobular inflammation and steatosis consistently lack an association with fibrosis progression [11••, 16, 23, 27•].

In the study by Angulo et al., on multivariate analysis, only fibrosis stage predicted adverse clinical outcomes. Compared with stage 0, fibrosis stages 1–2 and stages 3–4 had hazard ratios of 11.2 (95% CI, 1.33–93.47; P < 0.03) and 85.79 (95% CI, 10.93–673.30; P < 0.001), respectively. Similar results were demonstrated by Ekstedt et al. in which patients with stage 3–4 fibrosis had increased overall mortality and death from cirrhosis compared with those with stage 0–2 fibrosis [22]. This was independent of the NAS although patients with NAS 5-8 with stage 0–2 fibrosis had a trend towards increased mortality. The seminal importance of fibrosis stage was further confirmed in a recent meta-analysis of 1495 NAFLD patients from five studies [25••]. Liver-related mortality increased exponentially with each stage of fibrosis.

Results from these studies may give the impression that measures of disease activity such as ballooning degeneration are unimportant. However, fibrosis stage, ballooning degeneration, portal inflammation, and NASH diagnostic category are highly correlated. Furthermore, it should be emphasized that fibrosis is the result of hepatocellular injury and not the primary insult in NASH. Improvements in fibrosis are also strongly associated with improvement in disease activity including improvements in overall NAS and scores for ballooning degeneration, steatosis, and portal inflammation [28•]. Finally, it is important to remember that measures of disease activity are often decreased or absent in patients with advanced fibrosis. This is also true in other forms of chronic liver disease wherein diagnostic features often disappear in a cirrhotic liver. Given that liver-related events occur predominately in patients with cirrhosis [30], studies that focus on this endpoint will always demonstrate that fibrosis is the critical determinate of outcomes. However, to clarify the histologic features that influence fibrosis progression, it would be more instructive to focus on NASH patients with no or early fibrosis (stages 0–2).

Comparison of Histologic Scoring Systems in NAFLD

Table 1 describes the histologic scoring systems in NAFLD. In 1999, Brunt et al. were the first to propose a system for measuring disease activity and fibrosis in NASH [31]. Similar to scoring systems for other forms of chronic liver disease, a grade is provided for disease activity and a stage is provided for fibrosis. This is consistent with the concept that disease activity and fibrosis should be evaluated separately rather than combined into one index as they have different meanings. Disease activity is a measure of hepatocellular injury occurring at the time of the biopsy whereas fibrosis represents the liver’s response to prior episodes of hepatocellular injury.

Table 1 Histologic scoring systems in NAFLD

The Brunt system [31] evaluates a variety of features including steatosis, lobular inflammation, portal inflammation, and ballooning degeneration to arrive at an overall grade. Fibrosis is staged from 1 to 4 representing zone 3 perisinusoidal fibrosis, zone 3 perisinusoidal and periportal fibrosis, bridging fibrosis, and cirrhosis, respectively. In 2003, Harrison et al. proposed slight modifications to the Brunt criteria that removed evaluation of steatosis to quantify disease activity in a randomized controlled trial [34].

The NASH Clinical Research Network (NASH-CRN) proposed a system for scoring the full spectrum of NAFLD lesions in 2005 [32••]. The NAFLD activity score (NAS) is an unweighted index calculated by summing component scores for steatosis (0–3), lobular inflammation (0–3), and hepatocyte ballooning (0–2). These items were included as they were considered potentially reversible in the short term. This system also evaluates fibrosis separately similar to the Brunt staging system except that stage 1 is divided into stage 1a (delicate zone 3 perisinusoidal fibrosis), stage 1b (dense zone 3 perisinusoidal fibrosis), and stage 1c (periportal fibrosis only; seen in pediatric NASH). The NAS is the most commonly used index in NASH clinical trials although few studies have formally evaluated the operating properties of the NAS outside of the NASH-CRN [32••, 35, 36].

In 2011, Goodman and colleagues correlated numerous histologic features with liver-related mortality [27•]. While this comprehensive pathologic evaluation demonstrated strong correlation between fibrosis and liver-related mortality, its validity for use in clinical trials is unknown. Finally, in 2012, Bedossa et al. described the SAF (steatosis, activity, fibrosis) scoring system to aid in classifying liver biopsies of morbidly obese patients [33•]. The activity grade is based upon the sum of the scores for lobular inflammation (0–2) and hepatocyte ballooning (0–2) and excludes steatosis. In the presence of ≥ 5% steatosis, an algorithm based only on the ballooning and lobular inflammation scores was proposed to diagnose NAFL and NASH. In this algorithm, the presence of steatosis and ballooning degeneration without lobular inflammation was considered NAFL and not NASH.

Deficiencies in Current Scoring Systems

Ideally, an evaluative index should measure the outcome that it is intended to assess, be reproducible, and respond to clinically meaningful change in disease activity. Based on these criteria, the NAS is the most validated histologic instrument in NALFD as only a few studies have formally evaluated the Brunt system, SAF, Goodman scheme, and Harrison index. Of the three features that comprise the NAS, the degree of steatosis has been found to be the most reproducible, whereas agreement on lobular inflammation and ballooning degeneration is suboptimal [32••, 33••, 35,36,37,38]. The variability in reproducibility may, in part, stem from imprecise definitions. For example, steatosis can be measured by either percent hepatocytes with a steatotic droplet or by percent non-fibrotic surface area with fat. The former method has been adopted by the NASH-CRN; however, this may overestimate degree of fat within the liver by counting hepatocytes with small droplet macrovesicular steatosis as being equivalent to hepatocytes with large droplets of fat that entirely fill the cell (Fig. 1a). Estimating steatosis based on surface area with fat may actually correlate better with the actual amount of lipid within the liver. A focus of lobular inflammation has also not been defined in the NAS. The SAF system defines a focus as two or more inflammatory cells (neutrophils, lymphocytes/other mononuclear cells, eosinophils, and microgranulomas) present within the sinusoids or surrounding injured ballooned or apoptotic hepatocytes. This may be too low of a threshold particularly given the recent recommended clinical trial endpoints that incorporate lobular inflammation into a definition of NASH resolution.

Ballooning degeneration was not specifically defined in the NAS nor was there guidance for scoring severity of ballooning beyond the descriptors “few” or “many.” In contrast to the NAS, the SAF defines grade 1 ballooning as the presence of clusters of hepatocytes of similar size to normal hepatocytes but with a rounded shape and pale cytoplasm that is usually reticulated. In grade 2 ballooning, the cytoplasmic features are similar, but the hepatocytes are at least twofold larger than normal hepatocytes. The most accepted definition of ballooning degeneration is a hepatocyte that is generally larger that the surrounding hepatocytes with a distinctive rarified cytoplasm that is irregularly stranded or clumped [3]. Given the importance of ballooning degeneration in the diagnosis of NASH as well as its association with fibrosis, a clear definition on how to assess this feature is needed. Furthermore, an alternate score that more fully evaluates these cells should be considered given the limited range of scores in current systems. The NASH-CRN has been using an expanded five-tier ballooning score in their database since 2010 [39]; however, no published reliability or outcome data exists for this scale. In addition, one component of this expanded grading system—presence of “non-classical” ballooning—is of uncertain significance. Alternate methods of evaluating ballooning degeneration should be explored in future studies.

Based on the studies that have been performed since its development, the NAS has not been shown to reliably predict fibrosis and liver-related outcomes [18, 21, 22, 26••, 27•]. This is likely due to how the NAS is constructed as both lobular inflammation and steatosis often contribute more to the NAS than ballooning degeneration. Given that lobular inflammation and steatosis are not reliably associated with outcomes, it is not surprising that the NAS also fails to correlate. Besides ballooning degeneration, portal inflammation has been associated with clinical outcomes as well as fibrosis in multiple studies. However, portal inflammation is not included in the NAS or SAF index. The Brunt and Harrison grading systems incorporate this feature into the overall grade. The Goodman scheme evaluates portal inflammation but does not incorporate this and other features into an overall index.

The NASH-CRN fibrosis staging system was shown to be reproducible in the initial study; however, reproducibility of this system was suboptimal in two subsequent studies [35, 40]. Measuring fibrosis in NASH is challenging as fibrosis does not increase linearly from stage 1 to stage 4. This is reflected in the exponential increase in liver mortality with each increase in the stage of fibrosis. There is substantial variation in the amount of fibrous tissue within each fibrosis stage. This is particularly true of bridging fibrosis and cirrhosis. More specifically, stage 3 fibrosis can range from one definite fibrous bridge to complex bridging with or without rare nodule formation (Fig. 1e, f). Dividing stage 3 into sub-stages of bridging could help refine how bridging fibrosis is evaluated. The current staging system also does not fully capture the degree of perisinusoidal fibrosis that can be present at all stages. Perisinusoidal fibrosis is the characteristic fibrosis pattern seen in NASH and defines stage 1 disease in adults. The presence and severity of perisinusoidal fibrosis are not evaluated higher stages of fibrosis. This pattern of fibrosis can contribute significantly to the degree of collagen present within the liver and subsequently portal hypertension. Capturing the severity of this pattern of fibrosis at all stages may have value in predicting liver-related outcomes.

Histologic Endpoints in Clinical Trials

Recently, multiple groups including the US Food and Drug Administration, European Association for the Study of the Liver, AASLD, and the Liver Forum (a multistakeholder organization comprised of academic, industry, and regulatory experts) have published recommendations regarding endpoints in NASH clinical trials [2•, 41•, 42•, 43•]. All of these organizations recognize that histologic evaluation is essential in determining therapeutic efficacy, particularly in phase 2b and phase 3 studies. Histologic endpoints include improved NASH, resolution of NASH, and improved fibrosis. The proposed definition of improved NASH is ≥ 1-point reduction in the NAS ballooning score along with a ≥ 2-point reduction in the total NAS. Resolution of NASH is defined as a NAS lobular inflammation score ≤ 1, 0 for ballooning, and any value for steatosis. The suggested definition for fibrosis improvement is a decrease of ≥ 1 NASH-CRN fibrosis stage and no worsening of the NAS. However, based on current phase 2b and phase 3 trials, other endpoints have been proposed including change in total NAS score, ≥ 2-point reduction in SAF, and ≥ 2-point reduction in NAS with no worsening of fibrosis [42•].

Some of these endpoints are not well supported by current literature. In particular, inclusion of lobular inflammation in clinical trial endpoints is particularly problematic as both have not been shown to correlate with clinically meaningful outcomes. Inclusion of ballooning degeneration is appropriate given the strong association between ballooning degeneration, fibrosis, and clinical outcomes. However, the range of ballooning scores in both the NAS and the SAF is quite limited, and expanded scores that more fully evaluate this feature may improve responsiveness, particularly at the proof of concept stage of drug development. Consideration should also be given to measuring portal inflammation in clinical trials given that this was the other histologic feature that correlates with fibrosis, clinical outcomes, and fibrosis resolution.

Improvement in fibrosis is currently defined as ≥ 1 stage decrease in fibrosis according to the NASH-CRN. This endpoint has been difficult to achieve, and only obeticholic acid has met this endpoint based on an interim analysis of the REGENERATE phase 3 trial [44]. This may be in part due to the lack of a linear relationship between fibrosis stage and collagen deposition. Substantial improvements in collagen deposition within the liver can be seen in the absence of a decrease in fibrosis stage. Subdividing bridging fibrosis and measuring perisinusoidal fibrosis at all stages may improve evaluation of change in fibrosis. Quantitative measurements of fibrosis using a digital pathology solution should also be considered. Review of paired biopsies in a blinded manner in a side-by-side analysis at the end of the study may also improve measurement of this essential outcome.

One important aspect to consider in NASH clinical trials is the reading paradigm employed. Reading strategies used in NASH trials range from one central reader reading slides digitally to a consensus read generated by a group of pathologists using a multiheaded microscope. Given that histologic outcomes are often a primary endpoint, consideration should be given to employing central readers that are all routinely assessed for reliability and trained on similar material with standardized definitions. This would also improve comparisons across studies. Such central reader paradigms are routinely employed in clinical trials of inflammatory bowel disease with success [45]. For pivotal studies, use of multiple central reads with an adjudication process may also be appropriate.

Conclusions

Histologic evaluation is critical in the diagnosis of NAFL and NASH and in measuring disease activity and fibrosis. Steatosis, lobular inflammation, and ballooning degeneration are the key features that currently determine severity of disease activity; however, only ballooning degeneration and portal inflammation at baseline correlate with adverse outcomes. Fibrosis is the key determinate of liver-related mortality and morbidity, but current staging systems may not capture the full spectrum of fibrosis seen in NASH. Refinement and standardization of histologic scoring systems are necessary to improve diagnosis, for monitoring disease activity and fibrosis, and for evaluation of therapeutic efficacy in clinical trials.