1 Introduction

Non-alcoholic Fatty Liver Disease (NAFLD) has become one of the most important causes of chronic liver diseases in the world. Because the pathogenesis of NAFLD is closely related to obesity, hypertension, hyperlipidemia, hyperglycemia and the other metabolic factors, it has been proposed that NAFLD may be regarded as a hepatic manifestation of Metabolic Syndrome (MetS), and should be renamed G as Metabolic Associated Fatty Liver Disease (MAFLD) [1, 2]. It was reported that from 1999 to 2016, cardiovascular and renal risks and diseases have become highly prevalent in adults with MAFLD [3]. Patients with NASH or advanced fibrosis are at an increased risk of developing arteriosclerotic cardiovascular disease (ASCVD) compared with non-NAFLD controls or subjects with NAFL, independent of conventional metabolic risk factors for cardiovascular disease (CVD). Histological information on NAFLD may be helpful to promote our understanding of extrahepatic complications, such as ASCVD, resulting from NAFLD progression [4, 5].The correlation between NAFLD and MetS showed that both NAFLD and NASH increased the incidence of cardiovascular events. Therefore, NASH should be considered in cardiovascular risk assessment.

The course of NAFLD includes Simple Steatosis (SS), Nonalcoholic Steatohepatitis (NASH), fatty liver cirrhosis and Hepatocellular Carcinoma (HCC) that may develop further [6]. Up to now, liver biopsy is still the “gold standard” for the diagnosis of NAFLD, but biopsy is invasive and may result in serious complications, such as pain, infection, and bleeding, and it cannot be carried out in all suspected patients. An ultrasound examination is noninvasive, convenient, and widely used in clinical screening of NAFLD, but the diagnosis results are affected by too many factors, especially the subjectivity of the examiner.

Genetic factors, dyslipidaemia and alteration of glucose metabolism have been proven to be associated with NAFLD [7, 8].Some studies identify sets of lipids as predictive biomarkers of NAFLD progression by useing machine learning(ML) [9]. A variety of comprehensive evaluation indexes have been used in the assessment of NAFLD. For example, Hepatic Steatosis Index (HSI) [10] is calculated according to alanine aminotransferase (ALT) , aspartate aminotransferase (AST) , Body Mass Index (BMI), gender and diabetes history. Fatty Liver Index (FLI) [11] based on triglyceride (TG) , BMI, gamma-glutamyl transpeptidase (GGT) and Waist Circumference (WC). Chinese ZJU index [12] calculated by BMI, Fasting Plasma Glucose (FPG), TG, ALT and AST, etc. Broadly speaking, these indexes are models belong to the early elementary artificial intelligence. Recently, some studies have included biological markers such as adiponectin and Caspase-cleaved cytokeratin 18 fragment (M30) into evaluation models to predict NAFLD activity score (NAS). The model was not only able to monitor disease progression and weight change but also can distinguish NASH from NAFLD with an area under the receiver-operator characteristics (AUROC) of 0.70-0.73 [13].However, different algorithms and data sources from particular populations certainly affected the applicability and reliability of the model. Any simple formula or limited index is limited in evaluating the disease comprehensively, the predicting parameters and optimal algorithms to predict NAFLD are still heterogenous and varied across different studies [14]. The development of “big data” network and the application of AI algorithms provide new methods and possibilities for the better understanding of NAFLD. The advantage of AI over traditional statistical modelling is that it can recognize unique patterns and incorporate multiple factors to create predictive models, risk stratification [15] and outcomes, it is particularly suited for use in chronic diseases given the heterogeneity, complex nature, and overlapping confounding factors.

2 Overview of AI and ML in NAFLD research

AI is poised to influence nearly every aspect of the human conditions, and they are the collection of data-analytical techniques, aiming at building predictive models from multi-dimensional data sets. AI is a general term in which computer programs could think and behave like humans. After the introduction of ML, AI is the most searched term. Some describe ML as the primary AI application, while others describe it as a subset of AI. ML generates a mathematical algorithm from the training data set and uses it to predict results or make decisions. Later, with the development of neural networks, machines could classify and organize the input data like the human brain. The term "deep learning (DL)" is proposed based on multi-layer neural networks that can be applied to large datasets, which is suitable for the detection, classification and segmentation of biomedical images. DL is a subset of ML and ML is a subset of AI, the order of evolution of AI-ML-DL is showen in Fig. 1 [16, 17].

Fig. 1
figure 1

Evolution of artificial intelligence (AI)-machine learning (ML)-deep learning (DL)

According to the model of the training methods, ML can be divided into supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning (Fig. 2). Supervised learning is suitable for data with labels and it deals with annotated data with input–output pairs and common techniques including linear regression, logistic regression(LR), decision trees(DT), k-nearest neighbors(kNN), support vector machine (SVM), random forest (RF), Naive Bayes classification, and gradient boosting(GB) [18, 19]. Unsupervised learning works with data without a label and needs to classify the data according to its own structural characteristics. Semi-supervised learning is an amalgamation of supervised and unsupervised ML that can analyse a large amount of unclassified data whilst augmenting its pattern recognitionabilities with a small amount of preclassified data. Moreover, it can increase the speedand accuracy of information extraction fromlarge data sets [12]. What makes reinforcement learning different from other algorithms is that there is no supervisor and only one reward signal, it can learn interactive situations with the purpose of maximizing the reward signal. At present, supervised learning [20,21,22,23,24,25,26,27,28,29] and semi-supervised learning [30] are the most widely applied in medicine, this is also determined by the source and characteristics of medical data .

Fig. 2
figure 2

Classification and common algorithms of ML

According to the purpose of modeling, the models can be divided into diagnostic models, therapeutic models and prognostic models. The diagnostic models, also can be called as assessment models. These assessment models mainly complete the diagnosis tasks on the basis of large data set, and compare with the evidence-based medicine data, continuously optimize their functions according to the degree of task completion, so as to achieve more accurate diagnosis effects, and they have important values for non-invasive assessment of NAFLD. It has been also increasingly applied in the detection and prediction of NAFLD and fibrosis outcomes in recent years.

However, the choice of algorithms may affect the final judgment. A study evaluated different algorithms including LR, ridge regression, AdaBoost and DT models in detecting NAFLD from the general population based on 23 routine clinical and laboratory parameters. The result showed that Ridge score was the best performing algorithm with AUROC of 0.87 (95% CI 0.83-0.90) and 0.88 (0.84-0.91) in the training and validation groups respectively [31]. With the use of large-scale AI training cohorts for model development, the prediction on validation cohorts is probably more accurate compared with traditional biostatistical methods [22].

The therapeutic model focuses on new drug development, personalized lifestyle guidance and follow-up of therapeutic effects. It helps to develop personalized diets, to accelerate new drug researches and to improve the therapeutic schedules. At present, clinical medicine, basic biomedical research, and new health-related data sources (such as wearable devices with biosensors) generate a large amount of biomedical and health-related data, these data go beyond the limits of human analysis and need machines to help. Therefore, AI is providing NAFLD patients with a paradigm shift to precision medical practice.

In addition, the development of omics techniques in recent years may be used to further investigate the pathophysiological mechanisms of NAFLD. genomics, epigenomics, transcriptomics, metabolomics/lipidomics and glycomics in relation to the pathophysiology provided invaluable potential for the diagnosis and treatment of NAFLD. AI is a data-driven and hypothesis-free approach, which better incorporates clinical factors to detect hidden patterns for disease detection/prediction. The accurate analysis of large data sets from omics is an important advantage of artificial intelligence in NAFLD field.

3 AI application in NAFLD diagnosis

According to the data sources analyzed by AI, studies applied to NAFLD diagnosis can be classified as follows: electronic health record (EHR) data and lab test based, ultrasound imaging data based, radio imaging data based and liver histopathological data based.

3.1 Diagnostic models based on electronic health record data and lab test

A common goal of AI algorithm is to develop a predictive model based on statistical associations among features from a given data set. With the development of medical diagnosis technology, the EHR system with intelligent diagnostic function has become one of the important scientific issues of science for medical information. The EHR system contains structured data (such as diagnostic codes) and unstructured data (clinical documents including lab test), by analyzing these data with AI algorithms, the patients with NAFLD/NASH can be screened and the risks of disease progression also can be derived. Presently, AI algorithms used in EHR analysis include Natural Language Processing (NLP)-based approaches [60], text search-based approaches and International Classification of Diseases (ICD)-based approaches [63].

NAFLD progresses insidiously, and the presence of fibrosis is hard to spot, high-throughput identification of NAFLD with “electronic” follow-up through the EHR, could aid in understanding the risk factors for progression to cirrhosis. Van Vleck et al. [32] assessed above approaches to identify NAFLD within the EHR data compared to manual validation by clinicians in a large, multiethnic cohort. Results suggested that NLP approaches had the best overall performance compared to ICD and text search-based approaches, although there were numerous patients identified by ICD that were missed by NLP.

While with medical practice becomes more specialized and patient care is provided by more physicians, the opportunities for information loss at patient handoffs increase, and the analysis of progression to NASH was complicated by the fact that due to the complexity of diagnosing NASH, a firm diagnosis is often not made, the application of NLP still needs to be further improved. Koleck et al. [33] reviewed the literatures on the use of NLP to process or analyze symptom information documented in EHR free-text narratives. They found that NLP tools, classification methods, and manually curated rule-based processing are being used to extract information from EHR free-text narratives written by a variety of healthcare providers on a wide range of symptoms across diverse clinical specialties. Clear statement of the symptoms being evaluated as part of the study, a detailed description of the clinical population from which symptom information was extracted and analyzed, open sharing of user-developed symptom-related NLP algorithms or pipelines and vocabularies, and the establishment of formal reporting standards for investigations using NLP methodologies would be of help.

3.2 Assessment models based on ultrasound imaging data

Ultrasonography is a well-established and cost-effective imaging technique for the diagnosis of hepatic steatosis, especially for screening a large population at the risk of NAFLD, while it has shortcomings, including lower accuracy in mild steatosis confirmation, operator-dependent, and rather qualitative. Quantitative Ultrasound (QUS) imaging methods, including elastography, echogenicity analysis, and speckle statistical modeling can be obtained from single ultrasound radio-frequency data acquisition. Because these ultrasound imaging methods provide complementary quantitative tissue information, the characteristics of the liver can be obtained from their combination. The ML model is the main method to realize this function, up to now, ML models derived from ultrasound examination have been used in cardiovascular, nervous system diseases, and chronic hepatitis B [34], but there are very few reports on the application of ultrasound-based ML models in diagnosis of NAFLD. Tang et al. [35] established a ML model based on QUS parameters by using histopathology scoring as the reference standard to improve the classification of steatohepatitis with shear wave elastography (SWE) in rats. Their results proven that QUS parameters improved the classification accuracy of steatohepatitis, liver steatosis, inflammation, and fibrosis compared with the shear wave elastography alone. This attempt provided a basis for relevant researches in human beings. Wu et al. developed a RF model to predict ultrasonographic fatty liver disease by using age, gender, systolic and diastolic blood pressure, abdominal girth, glucose, TG, HDL, AST, and ALT, which outperformed the Naïve Bayes, artificial neural networks (ANN), and LR model with an AUROC of 0.93 [36].It is worth noting that the ultrasound data have racial differences, although many ultrasound images have been accumulated and are now available and ready for the preparation of a database for the development of computer-aided ultrasound diagnosis with DL technology [37]. The establishment of the model should also consider individual factors, such as races and geographic regions.

3.3 Assessment model based on radio imaging data

The use of ML has been increasing rapidly in the medical imaging field, including Computer-Aided Diagnosis (CAD), radiomics, and medical image analysis. The combination of imaging and artificial intelligence improves the accuracy of liver fibrosis staging. The deep learning radiology of shear wave elastography significantly improved the diagnostic performance of evaluating liver fibrosis [38]. In addition, as a tool of radiology, derivative parameters of image-based texture analysis combined with ML of non-contrast-enhanced T1-weighted magnetic resonance images can be as accurate as magnetic resonance elastography in the quantification of liver fibrosis (82%) [39].Manual tracking of the Region of Interest (ROI) of the liver is a standard method to measure liver attenuation on Computed Tomography (CT) to diagnose NAFLD. However, manual tracking is resource intensive. To address these limitations and expand the effectiveness of quantitative CT measurement of hepatic steatosis, Huo et al. [40] proposed an Automatic Liver Attenuation ROI-based Measurement (ALARM) method for automated liver attenuation estimation. It consists of two major stages: (a) deep convolutional neural network (DCNN)-based liver segmentation and (b) automated ROI extraction. The combination of DCNN and morphological operation can achieve "excellent" consistency with manual estimation of fatty liver detection. The whole pipeline is implemented as a docker container, which enables users to complete liver attenuation assessments within 5 minutes of each CT scan. Graffy et al. [41] developed a deep learning-based automated liver fat quantification tool with non-enhanced CT for determining the prevalence of steatosis in a large screening cohort. By using three-dimensional convolutional neural networks, including a sub cohort with follow-up scans, the automated volume-based liver attenuation, including conversion to CT fat fraction was analyzed and compared with manual measurement in a large subset of scans. The results showed that this fully automated CT-based liver fat quantification tool could be used for population-based assessment of liver steatosis and NAFLD, and the objective data matched well with the manual measurement results. By utilizing NLP, many studies developed algorithms which were capable of "reading" full-text radiology reports to accurately identify the presence of fatty liver disease [42]. Abdominal ultrasound, computerized tomography, and magnetic resonance imaging reports retrieved from random samples were analyzed by physicians and expert radiologists, the radiographic fatty liver disease could be determined by manual review. These algorithms could be used to rapidly screen patient records to establish a large cohort to facilitate epidemiological and clinical studies.

It can be seen from the above that, supervised learning played an important role in collecting and analyzing radio imaging data, therefore, radiologists have to face with the difficult tasks of annotating these large data sets. In addition, data sources and governance policies need to be developed to address the concerns of institutional review boards, as well as broader ethical issues around the use of large patient data sets.

3.4 Assessment models based on liver histopathological data

AI software may identify histological features of NAFLD with high levels of interobserver and intraobserver agreement (0.95 to 0.99) [43]. An automatic assessment of histological characteristics of NAFLD may reduce human variability and provide continuous, rather than semi-quantitative assessment of liver damages. However, the limited availability of experienced liver pathologists, the variability in pathologists’ agreement on detecting and quantifying various histological features of liver diseases, and the limited use of semi-quantitative manual grading scores hindered the application of AI in histopathological evaluation.

Macro-steatosis is the cardinal lesion of NAFLD, and it is commonly used as a major endpoint in therapeutic clinical trials in human NAFLD [32, 34].The pathological assessment of NAFLD consists of four key features: steatosis, lobular inflammation, fibrosis, and hepatocyte ballooning. Several studies aimed to automatically quantify the NAFLD score by ML algorithms [45]. Some aim to predict the severity of liver fibrosis [46]. In view of this, many attempts have been made in the automatic histopathological classification of fatty liver in rodents and humans. Deepak Sethunath et al. [33] created classifiers to detect macro- and micro-steatosis by using experts’ annotations, supervised ML and image processing techniques. For macro-steatosis prediction, the model’s precision, sensitivity, and AUROC were 94.2%, 95%, 99.1% respectively. By using similar approaches, Scott Vanderbeck et al. [20, 47] developed an automated classifier, including naïve Bayes, LR, DT, and neural networks, which can detect and quantify steatosis of the liver in humans. The classification algorithm performs with 89 % overall accuracy. It identified macro-steatosis, bile ducts, portal veins and sinusoids with high precision and recall (≥82%). The above tentative work demonstrated that the automatic quantification of cardinal NAFLD histologic lesions is feasible and offers promise for further development of automatic quantification as a potential aid to pathologists to evaluate NAFLD biopsies in clinical practices and clinical trials.

3.5 Assessment models on eveluation of NAFLD, NASH and fibrosis

From what has been discussed above, the diagnosis of NAFLD focused on distinguishing healthy VS NAFLD/NASH [20, 22,23,24,25,26, 31, 35, 38, 48,49,50,51], NAFL VS NASH [32, 52, 53] and fibrosis VS non-fibrosis [21, 27, 54]. We summarized literatures according to “what is being predicted” and listed them in detail. (See Table 1)

Table 1 ML models in NAFLD diagnosis on evaluation of NAFLD,NASH and fibrosis

In fact, it is not easy to accurately distinguish between NAFLD and healthy by using regular imaging and laboratory examination, the presence of NAFLD may change dynamically in one case. Those suspected patients with NAFLD on imaging are often not acknowledged in subsequent clinical documentation. Many of such patients are later found to have more advanced liver diseases. SVM, LR and DT algorithms are mainly used in ML research for NAFLD diagnosis, most of them were retrospective studies, and no biopsy was included. The parameters involved were mainly derived from EMR data, morphology features and lab tests. The specificity ranges from 60% to 92%,Ridge score was reported to be more advantageous.

NASH indicates the progression and deterioration of NAFLD, which can only be confirmed by biopsy, so it is of great value to explore the role of non-invasive model of ML in the assessment of NASH. This kind of research included mice and human studies, almost always use biopsy as the standard, the main purpose is to investigate the predictive accuracy of NASH, NAFL or health status and quantitatively interpret biopsy specimens of patients with NAFLD. The highest specificity was 99% [51]. It should be mentioned that NLP-based approaches facilitated analyses of knowledge flow among physicians and enabled the identification of breakdowns where the key information was lost that could have slowed and prevented the progression of early NAFLD to NASH or cirrhosis. Koleck et al. [33] synthesized literatures on the use of NLP in order to process or analyze symptom information documented in EHR free-text narratives. They suggested that future NLP studies should focus on the investigation of symptoms, and symptom documentation in EHR free-text narratives should strive to examine the patients’ characteristics and make symptom-related NLP algorithms or pipelines and vocabularies openly available.

There are not many studies on fibrosis. Assessment models on evaluation of NAFLD, NASH and fibrosis were listed in the Table 1. Model studies related to NAFLD and liver injury, Diabetes Mellitus (DM), CVD and hepatocellular carcinoma (HCC) are also listed, maybe future research should focus more on these aspects.

4 AI application in NAFLD treatment

4.1 Therapeutic models for personalized lifestyle guidance

It has been suggested that NAFLD should be listed as one of the complications of DM since NAFLD has been renamed as MAFLD, and more attention has been paid to the therapies related to diet and glycemic control in the patients with MAFLD. However, the composition of healthy diet and personalized glycemic control have not been clearly established yet.

A new approach called "nutritional geometry" considers the importance of integrating nutrition, animals, and the environment. The goal of this approach is to combine nutrients and foods in a model to understand how food ingredients interact to regulate dietary characteristics that affect health and diseases. The AI algorithms may create a personalized diet for patients, which can provide personalized nutrition consultation to prevent and treat NAFLD [55]. Zeevi et al. [56] measured the postprandial glucose responses to 46.898 foods in 800 individuals and found a great variability in responses to the same foods. Based on these data, they designed a ML algorithm-GB regression, a data-driven unbiased approach, which integrates blood parameters, dietary habits, anthropometrics, physical activity, and gut microbiota to predict personalized postprandial glycemic response in real-life meals. Then, they validated these predictions in an independent 100-person cohort. The results showed that the blinded randomized controlled diet intervention based on this algorithm resulted in a significant decrease in postprandial response and continuous alterations in intestinal flora configuration, which meant that personalized diet could successfully improve postprandial hyperglycemia and its metabolic consequences.

There are also ideas that collect dietary and microbiome data from many individuals [57], derive AI models of how diet affects the composition of the microbiome, and then validate the models with controlled dietary interventions.

The contribution of the above-mentioned AI algorithms in personalized diet guidance is helpful to control the development of NAFLD. A few studies are using smart phones to collect clinical images to assist in the diagnosis of hepatic steatosis [58]. In the future, patients may wear devices to record what they eat, and then the information will be processed through DL and integrated by AI, combined with a variety of data (physical activity, stress level, sleep, microbiome, physiological constants, medications and genome) to provide individualized dietary recommendation and nutritional counseling to prevent and treat NAFLD.

4.2 AI models related to the development of therapeutic drugs for NAFLD

One way to reduce costs is to use genetic data to inform drug design. Genetic information helps researchers to demonstrate that drug targets are relevant to the disease from the start, and drugs with this evidence are twice as likely to be approved as those without [59]. By using AI techniques, we can further optimize drug discovery. If we start with a ‘deep’ molecular profile that includes data about the microbiome and genome, as well as information about metabolites and proteins (the metabolome and proteome), coupled with physiological measurements, we may be able, in some cases, to skip animal testing and move straight to human trials [57]. Various ML systems and AI platforms had been used to search for immuno-oncology drugs and metabolic-disease therapies. It is predicted that AI and ML will create an era of faster, cheaper, and more effective drug discovery.

While, there is no effective drug to control the progression of MAFLD in the clinic up to now. Farnesoid X Receptor (FXR) agonists can reverse deregulated bile acid metabolism, and thus, they are potential therapeutics to prevent and treat NAFLD [44]. Unfortunately, Structure-based Virtual Screening (SBVS) that can speed up drug discovery has rarely been reported with success for FXR, which was likely hindered by the failure in addressing protein flexibility. Xia et al. [61] devised human FXR (hFXR)-specific ensemble learning models based on pose filters from 24 agonist-bound hFXR crystal structures and coupled them to traditional SBVS approaches of the FRED docking plus Chemgauss 4 scoring function, as a result, they provided a promising lead compound for further development.

Newer antidiabetic drugs, such as selective PPAR-γ modulators (SPPARMs), glucagon-like peptide-1 receptor agonist (GLP-1RA) and sodium-glucose cotransporter 2 inhibitors (SGLT-2is) were recommended in T2DM, might contribute substantially to NAFLD/NASH amelioration, possibly reducing not only liver-specific but also cardiovascular morbidity. Tomohide Yamada et al. compared the risk of myocardial infarction (MI) among SGLT-2is, GLP-1RAs and dipeptidyl peptidase-4 inhibitors (DPP-4is) and developed a machine learning model for predicting MI in patients without prior heart disease. After developing a machine learning model to predict MI, proportional hazards analysis of MI incidence was conducted using the risk obtained with this model and the drug classes as explanatory variables. Receiver operating characteristics analysis showed higher precision of machine learning over logistic regression analysis. Finally, the machine learning analysis suggested the risk of MI was 37% lower in type 2 diabetes patients without prior MI using GLP-1RAs versus DPP-4is, while the risk was 19% lower for SGLT-2is versus DPP-4is. While there have been no studies using machine learning methods to evaluate the efficacy of these drugs in NAFLD to date [62].

5 Summary

Diagnosis and treatment of NAFLD face many challenges although the medical science has been very developed. Limitations of AI technology include the lack of high-quality data sets for ML development. Most of the evidence used to develop ML algorithm comes from preclinical research. Efficiency, accuracy, and individualization are the main goals to be solved.

In addtion, the ambiguity of AI made it problematic for machine learning systems to be adopted in a sensitive yet critical domains, such as healthcare. As a result, scientific interest in the field of Explainable Artificial Intelligence (XAI), a field that is concerned with the development of new methods that explain and interpret machine learning models, has been tremendously reignited over recent years [64]. However, AI provides a new way for disease understanding by extracting the characters of complex data and combining them with the mode of “automatic learning”, which will contribute to an increase in diagnostic quality, facilitate the development of remote medicine, and reduce the costs in the national health care. From this point of view, it may have a large enough potential to induce a paradigm shift in the handling of NAFLD. Certainly, ML itself is far from fulfilling its potential in NAFLD researches, and we have a long way to go to uncover the networked intricacies and complexities of living systems. Accumulation of subjective and objective data and long-term follow-up verification are still the most basic, individual factors, which should be considered in the application of AI model, XAI for NAFLD studies is also need to be explored.