Artificial intelligence for ultrasonographic detection and diagnosis of hepatocellular carcinoma and cholangiocarcinoma

Chaiteerakij, Roongruedee; Ariyaskul, Darlene; Kulkraisri, Kittipat; Apiparakoon, Terapap; Sukcharoen, Sasima; Chaichuen, Oracha; Pensuwan, Phaiboon; Tiyarattanachai, Thodsawit; Rerknimitr, Rungsun; Marukatat, Sanparith

doi:10.1038/s41598-024-71657-z

Artificial intelligence for ultrasonographic detection and diagnosis of hepatocellular carcinoma and cholangiocarcinoma

Article
Open access
Published: 04 September 2024

Volume 14, article number 20617, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Artificial intelligence for ultrasonographic detection and diagnosis of hepatocellular carcinoma and cholangiocarcinoma

Download PDF

Roongruedee Chaiteerakij¹,
Darlene Ariyaskul²,
Kittipat Kulkraisri²,
Terapap Apiparakoon¹,
Sasima Sukcharoen¹,
Oracha Chaichuen¹,
Phaiboon Pensuwan³,
Thodsawit Tiyarattanachai²,
Rungsun Rerknimitr¹ &
…
Sanparith Marukatat⁴

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

The effectiveness of ultrasonography (USG) in liver cancer screening is partly constrained by the operator’s expertise. We aimed to develop and evaluate an AI-assisted system for detecting and classifying focal liver lesions (FLLs) from USG images. This retrospective study incorporated 26,288 USG images from 5444 patients to train YOLOv5 model for FLLs detection and classification of seven different types of FLLs, including hepatocellular carcinoma (HCC), cholangiocarcinoma (CCA), focal fatty infiltration, focal fatty sparing (FFS), cyst, hemangioma, and regenerative nodules. AI model performance was assessed for detection and diagnosis of the FLLs on a per-image and per-lesion basis. The AI achieved an overall FLLs detection rate of 84.8% (95%CI:83.3–86.4), with consistent performance for FLLs ≤ 1 cm and > 1 cm. It also exhibited sensitivity and specificity for distinguishing malignant FLLs from other benign FLLs at 97.0% (95%CI:95. 9–98.2) and 97.0% (95%CI:95.9–98.1), respectively. Among specific FLL types, CCA detection rate was at 92.2% (95%CI:88.0–96.4), followed by FFS at 89.7% (95%CI:87.1–92.3), and HCC at 82.3% (95%CI:77.1–87.5). The specificities and NPVs for regenerative nodules were 100% and 99.9% (95%CI:99.8–100.0), respectively. Our AI model can potentially assist physicians in FLLs detection and diagnosis during USG examinations. Further external validation is needed for clinical application.

Artificial intelligence (AI) models for the ultrasonographic diagnosis of liver tumors and comparison of diagnostic accuracies between AI and human experts

Article Open access 27 February 2022

A multicenter clinical AI system study for detection and diagnosis of focal liver lesions

Article Open access 07 February 2024

Preoperative classification of primary and metastatic liver cancer via machine learning-based ultrasound radiomics

Article 14 January 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Introduction

Hepatocellular carcinoma (HCC) and cholangiocarcinoma (CCA) are the most common primary liver cancers, with markedly increasing incidences worldwide^1,2. HCC commonly develops in individuals with cirrhosis, which is the strongest risk factor for HCC and can arise from several etiologies, including hepatitis B virus (HBV) and hepatitis C virus (HCV) infection, heavy alcohol consumption, and non-alcoholic fatty liver disease^2,3,4,5. CCA occurs in individuals who have chronic inflammation of the biliary epithelium^2,3,6, particularly from liver fluke infection in East Asian populations⁷ and primary sclerosing cholangitis in Western populations^2,3,4. HCC and CCA could be difficult to differentiate from other benign liver lesions because their characteristics in ultrasound images can be remarkably varied according to their growth pattern^6,8. Frequently, HCC typically appears as a hypoechoic lesion with hypervascularity, while CCA appears as an intrahepatic hypoechoic lesion with hepatobiliary duct dilation^5,6. Early HCC and CCA patients often lack symptoms, leading to advanced- stage diagnosis^2,5. Thus, the best opportunities to improve patient outcomes are through surveillance in individuals at risk, which has been shown to detect early stages of the disease that can be curative, thus reducing the mortality of these patients^2,5.

Radiologic examination is currently recommended for HCC and CCA surveillance. Ultrasonography (USG) is the most commonly used radiologic tool due to its wide availability, relatively low cost, and lack of radiation exposure. Despite lots of advantages, the worldwide USG surveillance rate remains low^9,10. One of the major barriers to surveillance is the availability of skilled professionals, including insufficient radiologists and ultrasonographers who can proficiently serve for cancer surveillance, particularly in remote areas^10,11. USG is operator dependent, and the examination results could be subjectively misinterpreted. Thus, developing an assisted-ultrasound system that helps detect and classify focal liver lesions (FLLs) during real-time USG examination could potentially overcome these barriers.

Due to the rapid advancement in deep-learning algorithms and high capability to analyze complex data, artificial intelligence (AI) has been increasingly utilized in healthcare, in particular, to enhance the sensitivity and accuracy of medical image interpretation^12,13. An AI assisted-USG image analysis system would be of great benefit to increase the accuracy of USG examination and interpretation for non-radiologist physicians and healthcare personnel. This system could potentially be one of strategies to improve overall liver cancer surveillance rates^14,15.

Recently, AI models were developed for the detection and diagnosis of FLLs in ultrasound images and demonstrated a promising performance with 81–97% sensitivity and 82–97% specificity for detecting and classifying FLLs commonly found in clinical practice^16,17,18. We previously developed a convolutional neural network (CNN) AI model for USG image analysis¹⁷. Despite the satisfactory performance of the CNN AI model in detecting and diagnosing 5 different types of FLLs, we acknowledged some limitations. For example, the model had a limited capability to differentiate FLLs with atypical characteristics, leading to misclassification¹⁷. Importantly, the model was inapplicable in clinical settings due to its relatively slow real-time processing speed.

A newer class of AI model called “YOLO” for object detection tasks has shown to outperform the CNN model and is more suitable for real-time object detection in images¹⁹. Accordingly, we aimed to develop an updated AI analysis system using the YOLOv5 model as a framework to detect and differentiate various types of FLLs in USG still images. In addition to the 5 FLLs studied in the previous work, we expanded the model’s capacity to enable the detection and diagnosis of CCA and regenerative liver nodules, which are commonly observed in cirrhotic patients.

Methodology

Dataset

The retrospective study was approved by the Research Affairs Institutional Review Board of the Faculty of Medicine, Chulalongkorn university (IRB No. 423/61) and the need to obtain informed consent was waived. Clinical data was de-identified and analyzed anonymously. The study was compliant with the ethical guidelines of the Declarations of Helsinki.

The study comprised a total of 7 FLLs (2 malignant and 5 benign lesions) (Fig. 1). For malignant FLLs, we focused on HCC and CCA because they have the highest prevalence among primary liver cancers^2,4,5. Regarding the benign FLLs, liver cysts, hemangioma, focal fatty infiltration (FFI), focal fatty sparing (FFS), and regenerative nodules were chosen as they were most frequently seen during HCC and CCA surveillance by USG examination²⁰.

The gold standard for diagnosing FLLs was based on pathology and/or imaging criteria. For benign lesions, diagnoses were primarily determined by characteristic imaging features observed in USG, computed tomography (CT), or magnetic resonance imaging (MRI), which are widely accepted in clinical practice. These lesions were confirmed as benign if their appearance and size remained consistent over a 2-year follow-up period. For malignant lesions, histopathological examination was used for confirmation whenever possible (approximately 10% of HCC and CCA patients had pathological diagnosis). In cases where pathology was unavailable, HCC was diagnosed using the American Association for the Study of Liver Diseases (AASLD) criteria, which include the presence of non-peripheral arterial enhancing lesions with washout in the portal venous and/or delayed phase in dynamic contrast CT or MRI⁵. For CCA, diagnoses were based on imaging criteria combined with a rising serum tumor marker CA 19–9 level and/or malignant clinical progression. The imaging criteria for CCA varied by type: for perihilar and distal CCA, criteria included malignant-appearing biliary obstruction, abrupt narrowing or stricture of the bile duct with irregular thickening and enhancing walls, or an intraductal irregular enhancing mass. Intrahepatic CCA was diagnosed by identifying an ill-defined mass-forming lesion with peripheral arterial enhancement followed by gradual centripetal delayed enhancement. Additional features included adjacent dilated intrahepatic ducts, capsular retraction, atrophic changes in the affected liver lobe with compensatory hypertrophy in the contralateral lobe, smaller satellite nodules, and vascular encasement^2,6,8.

The process of image selection and allocation to the three datasets followed a previously published protocol as follows¹⁷:

Image selection

We retrospectively retrieved still images from upper abdominal USG examinations stored in the Digital Imaging and Communications in Medicine (DICOM) format within the Picture Archiving and Communication System (PACS) in a medical school, the King Chulalongkorn Memorial Hospital, Bangkok, Thailand, and a community hospital, Roi-et Hospital, Roi-et, Thailand. These still images were captured as snapshots during ultrasound examinations using 13 machines from 11 different brands and models of ultrasound machines across the two hospitals (Supplemental Table 1). All raw USG images were obtained as original color images. Inclusion criteria included: (1) Images containing FLLs with a definitive diagnosis, (2) Images without FLLs served as negative controls, establishing a baseline for liver parenchyma without any occupying lesions, (3) A maximum of two images from one USG study and two USG studies from one patient were allowed, (4) For studies with a single FLL, two images from different probe angles were randomly selected, and (5) For studies with multiple FLLs (≥ 2 FLLs), images containing different FLLs were chosen. Exclusion criteria included (1) Images depicting metastatic liver lesions, (2) Images showing HCC or CCA that had undergone treatments affecting lesion appearance, (3) Images obtained using doppler, contrast, or elastography techniques, and (4) Images severely compromised by significant motion artifacts, poor resolution or excessive noise that hindered accurate interpretation were also excluded, despite no strict requirements for image quality.

Image allocation

To optimize the AI training process, we included images containing FLLs and images without FLLs in a 1:2 ratio. All images were randomly distributed into three sets: the training set (for model training), the tuning set (for adjusting and optimizing the model performance during training), and the test set (for assessing the model performance) (Fig. 2). The images obtained from the same patient were allocated to the same set to ensure that all the three sets were completely independent.

AI model

This work employed YOLOv5 framework as the AI model. Since first released in 2015, YOLO has become a famous algorithm for real-time object detection due to its small size and rapid computation speed¹⁹. At the time of this work, the most updated version was YOLOv5 of which its ability was markedly improved compared to the previous YOLOv4, i.e. the YOLOv5 possesses more lightweight model size, increased versatility, and much faster training speed^19,21. It has data enhancement functions that effectively detect small objects, which was the most problematic limitation in the prior YOLO model²².

The fundamental principle of object detection involves delineating each object of interest, such as liver lesions, by its bounding box and corresponding class (e.g., cyst, HCC). Conventional computer vision methodologies typically employ a two-stage approach: first predicting the bounding box encompassing the liver lesion, followed by the classification of the detected anomaly. In contrast, the YOLO algorithm adopts a unified approach, simultaneously predicting both the object's bounding box and its class. This innovative design significantly accelerates the detection process.

YOLO's methodology involves processing the entire input image through a deep neural network to extract a high-level feature map representing the input image. This feature map is subsequently partitioned into an N × N grid, where N is a user-defined parameter. Each cell within this grid is tasked with detecting objects whose center falls within its boundaries (Fig. 3). To handle objects of varying dimensions and aspect ratios, multiple bounding boxes are predicted for each cell.

During the inference phase, YOLO predicts a large number of bounding boxes. However, bounding boxes lacking liver lesions are assigned low confidence scores across all classes and subsequently eliminated. Note that YOLO may generate multiple bounding boxes around a single FLL. To mitigate this redundancy, a post-processing technique known as "non-maximum suppression" (NMS) is employed. This algorithm evaluates groups of proximal bounding boxes, quantifying their similarity through the intersection over union (IoU) metric. Within each group, only the bounding box exhibiting the highest confidence score is retained.

The confidence threshold and the IoU threshold used in the NMS algorithm play a crucial role in modulating the performance of the YOLOv5 model. To optimize model performance, detection rates were evaluated using an IoU threshold of 0.3 and a confidence threshold of 0.25 as criteria for correct detection. These thresholds were empirically determined to provide an optimal balance between detection sensitivity and specificity in the context of hepatic lesion identification.

To simulate authentic clinical scenarios encountered during USG examination, the confidence scores from all seven differential diagnoses were aggregated and normalized to generate class posterior probabilities. These probabilities were subsequently presented in descending order for each FLL, providing a prioritized list of potential diagnoses that closely resembles the decision-making process in clinical practice.

Experimental setup

Data pre-processing

The FLLs depicted in USG images were independently reviewed by two hepatologists to verify the diagnosis of each FLL. They each reviewed different sets of images without cross-checking due to time constraints and practicality. After verifying the diagnosis of FLLs, the images were labeled by experienced sonographers following standardized criteria, which included the ultrasound characteristics of each type of lesion and specific features to consider (Fig. 1a). Periodic quality control checks were conducted by two hepatologists, who independently reviewed a random sample of labeled images to ensure adherence to the protocol.

The patients’ identification and the peripheral regions in the USG images were cropped out. The images were resized to dimensions of 640 pixels width and 640 pixels height and input into the YOLOv5 algorithm.

Training and tuning process

YOLOv5 codes obtained from the ultralytics repository were customized and optimized for the task of detecting and classifying FLL in USG images^21,22. We integrated YOLOv5 architecture with the PyTorch framework and the Darknet architecture, combined with Cross-Stage Hierarchical Networks (CSPNet). To enhance the model performance of object detection in USG images, a COCO dataset was used to pretrain the algorithm¹⁷. Further refinement of the model performance was achieved through fine-tuning during the training process using the tuning set. The model achieved its maximum mean average precision, indicating optimal performance, at or beyond 150 epochs. This iteration was then selected to be used for assessing model performance.

Performance evaluation

The model’s performance was evaluated on the test set. Two primary tasks, namely detection and diagnosis, were evaluated independently on both a per-image and per-lesion basis using the following metrics:

Detection task

An FLL was considered as being correctly detected when the model generated a box that accurately encompassed the region of the FLL in the image. The detection rate was derived from the number of FLLs correctly detected divided by the total number of FLLs.

Diagnosis task

The diagnosis performance was evaluated as sensitivity, specificity, positive predictive value (PPV), negative predictive value (NVP), and accuracy, calculated using the following formula: sensitivity = a/(a + c), specificity = d/(b + d), PPV = a/(a + b), NPV = d/(c + d), and accuracy = (a + d)/(a + b + c + d), where a, b, c, and d were the numbers of FLLs that were correctly diagnosed, falsely diagnosed, falsely excluded, and correctly excluded, respectively.

Statistical analysis

The evaluation of the overall model performance included two aspects: 1) a comparative analysis of detection and diagnosis between benign and malignant FLLs, and 2) an evaluation of detection and diagnosis of each specific type of FLLs. The model performance in detecting FLLs was reported as detection rates. The model performance in diagnosing FLLs was reported as sensitivities, specificities, PPVs, NPVs, and accuracies with their respective 95% confidence intervals (95%CI). All statistical analyses were conducted using Python version 3.8.13 (Python Software Foundation, Delaware, USA).

Results

Dataset characteristics

Table 1 describes characteristics of the three datasets consisting of 26,288 images (9397 images with FLLs and 16,891 images without FLL) from 5444 patients (Supplemental Table 2). There were 7493, 523, and 1381 images with FLLs randomly allocated into training, tuning, and test set, respectively (16:1:3 ratio). The images with FLLs contained a total of 16,098 FLLs (some images contained more than 1 FLLs). The number of CCAs, HCCs, FFS, cysts, hemangiomas, focal fatty infiltration (FFIs) and regenerative nodules were 6,454 (40.1%), 1635 (10.2%), 3306 (20.5%), 2229 (13.9%), 1378 (8.6%), 563 (3.5%), and 533 (3.3%), respectively, with the median sizes (interquartile range, IQR) of 4.12 (3.40), 3.80 (4.24), 1.49 (1.03), 1.47 (1.66), 1.16 (1.37), 2.71 (3.18), and 0.97 (0.44) cm, respectively.

Table 1 Characteristics of patients and focal liver lesions (FLLs).

Full size table

Among the images without FLL, 7,414 images were randomly assigned to the training set and the remaining images were randomly allocated in a 2:1 fashion, resulting in 6184 images in the tuning set and 3293 images in the test set. This allocation ratio of 2:1 was chosen to minimize false positive errors and yielded the best model performance.

Performance of AI system

Performance of the AI system for overall FLLs is demonstrated in Table 2 and Fig. 1b. The model provided an overall FLL detection rate of 84.83% (95%CI: 83.30–86.36). When categorized by FLL size as shown in Table 3, the detection rates of FLLs ≤ 1 cm and > 1 cm was 84.66% (95%CI: 81.72–87.59) and 84.95% (95%CI: 83.16–86.74), respectively.

Table 2 Performance of deep learning model for detection of focal liver lesions.

Full size table

Table 3 Number of each focal liver lesion classified by size of lesion.

Full size table

Malignant and Benign FLLs.

As shown in Table 4, the detection rates for malignant and benign lesions were 86.50% (95%CI: 82.99–90.02) and 86.65% (95%CI: 84.57–88.74), respectively. After successfully detecting FLLs, the AI model achieved comparable sensitivity for correctly classifying malignant and benign lesions, with values of 97.03% (95%CI: 95.88–98.18) and 96.96% (95%CI: 95.87–98.05), respectively. The NPV for malignant lesions was slightly greater than for benign lesions, with 97.37% (95%CI: 96.35–98.39) and 96.57% (95%CI: 95.34–97.80), respectively. The overall accuracy for diagnosing both malignant and benign lesions was 96.99% (95%CI: 87.78–96.20).

Table 4 Performance of deep learning model for detection and diagnosis of malignant and benign FLLs.

Full size table

Types of FLLs

The diagnostic performance of seven types of FLLs were shown in Table 5. Among the seven FLLs, the model demonstrated the highest detection rate for CCA, with a value of 92.21% (95%CI: 87.97–96.44), followed by FFS at 89.69% (95%CI: 87.06–92.32). The lowest detection rate was observed for FFI at 62.26% (95%CI: 49.21–75.31). Detection rates for the other FLLs ranged between 78.05 and 88.65%.

Table 5 Performance of deep learning model for detection and diagnosis of 7 types of focal liver lesions.

Full size table

Upon the successful detection of each type of FLLs, the model further classified the FLLs into one of the seven specific diagnoses. For malignant FLLs, the model had a better overall diagnostic performance for CCA than HCC, with accuracies of 97.38% (95%CI: 96.64–98.12) and 94.82% (95%CI: 93.79–95.84), respectively. The model also demonstrated superior sensitivity and PPV for diagnosing CCA compared to HCC, with sensitivities of 96.51% (95%CI: 95.08–97.94) vs. 78.10% (95%CI: 72.50–83.69) and PPVs of 96.06% (95%CI: 94.54–97.57) vs. 77.73% (95%CI: 72.11–83.34), respectively, while the specificities and NPV for diagnosing both types of cancers were comparable, ranging from 97.03% to 98.11%.

Regarding the individual diagnostic results for the five benign FLLs, the model’s performance ranged from 74.25 to 98.81% for sensitivity, 97.03 to 100% for specificity, 77.73 to 100% for PPV, 97.10 to 99.66% for NPV, and 94.82 to 99.94% for accuracy. The model showed the highest sensitivity of 98.81% (95%CI: 97.86–99.76) for diagnosing FFS and the lowest sensitivity of 74.25% (95%CI: 67.62–80.88) for diagnosing hemangioma. The highest accuracy of 99.94% (95%CI: 99.84–100) was observed for diagnosing regenerative nodules, while the lowest accuracy of 96.27% (95%CI: 95.39–97.14) was noted for diagnosing hemangioma.

Discussion

In this study, we developed and evaluated an AI system using the YOLOv5 model to detect and classify FLLs from 26,288 USG images. The AI demonstrated high accuracy, achieving an 84.8% detection rate and excellent sensitivity and specificity for both benign and malignant lesions, including HCC and CCA, indicating its potential to enhance clinical diagnostic accuracy.

The model demonstrated an accuracy of 96.99% in classifying malignant and benign FLLs. Compared to the previous CNN model, the YOLOv5 model’s accuracy in classifying malignant and benign FLLs increased by 13%¹⁷. This improved performance was likely due to the YOLO model’s superior ability in image detection, particularly its enhanced capacity to detect small objects compared to the CNN model²².

The sensitivity of YOLOv5 in detecting and classifying FLLs was also superior to that of the CNN model. For HCC, the diagnostic sensitivity increased from 73.6% to 78.10%, and for FFI, it improved from 69.0% to 82.86%. Additionally, the overall specificity of hemangioma, cyst, HCC, FFS, and FFI surpassed that of the CNN model. Across all seven FLLs diagnoses, the AI system consistently showed high specificities and NPVs, over 97%. These excellent specificities and NPVs indicate that the AI system could be highly useful in clinical practice²³. Its high specificities help physicians enhance the accuracy of FLLs diagnosis and make informed further management decisions.

Among the seven FLLs studied, the model exhibited the lowest diagnostic sensitivities for hemangioma, HCC, and FFI, with sensitivities of 74.25%, 78.10%, 82.86%, respectively. Interestingly, these findings were consistent with the diagnostic performance of our previous model, which reported sensitivities of 80.8%, 73.6% and 69.0% for hemangioma, HCC, and FFI, respectively¹⁷. We suspected that these three types of FLLs might be challenging to distinguish from one another due to their atypical characteristics and similar appearance on USG images, leading to lower sensitivity.

This study has several strengths. Notably, it included CCA and regenerative nodules in the AI system. The inclusion of regenerative nodules as an input parameter has not been previously explored, data on AI detection of CCA in USG images was also very limited. Indeed, USG screening for CCA remains challenging due to the absence of specific features of the lesions, occasionally resulting in misclassification between CCA and HCC^6,8. Early detection of CCA is significantly crucial because it is an aggressive cancer known for its resistance to chemotherapy^2,24. The 5-year survival rate for metastasized CCA was below 2%; however, when detected early, the 5-year survival rate increased to 74%²⁵. With diagnostic sensitivity and specificity of 96.51% and 97.85%, respectively, this AI model proves valuable in classifying and detecting CCA lesions, particularly in areas with high prevalence such as Southeast Asia. Its application can aid in preventing the fatal progression of CCA and improving the survival outcomes for at-risk individuals. Lastly, the USG images were obtained from various USG models and machines. The diversity in USG vendors and versions enhances the generalizability of the developed AI system in clinical settings where different USG machines are used. Moreover, YOLOv5 model can eliminate the slow real-time processing speed constraints of CNN, suggesting its feasibility for implementation in real clinical settings.

There are several limitations to this study. We acknowledge the limitation of using still images captured by experienced radiologists, which contrasts with the real-time detection process during ultrasound examinations. Our primary aim was to evaluate the accuracy of the YOLO v5 deep-learning model in detecting and diagnosing FLLs using a diverse set of pre-identified images. This initial step is necessary to establish the model's capability before progressing to real-time applications. Further studies are warranted to validate the model's performance in real-time clinical settings, dynamically identifying lesions during ultrasound examinations to assess its practical applicability and robustness in real-world scenarios. Given this limitation, this work was considered a phase 0 study within the domain of AI-assisted systems for healthcare, focusing on system development and performance evaluation²⁶. A phase 1 study to evaluate the performance of the developed AI-assisted USG model is currently underway. Nonetheless, we recently assessed the utility of an AI-assisted system in a randomized controlled trial²⁷. The study evaluated whether the AI system could assist non-expert operators in detecting FLLs, and the results showed that the AI system significantly increased the FLLs detection rate for non-expert sonographers. Although our previous work suggested that the AI-assisted system could enhance the FLL detection rate for non-expert sonographers, it remains necessary to validate the performance of the YOLO model in real-time USG examinations performed by non-experts, given that the model in the present study was not developed using the same CNN model as the previous one. Another limitation was that ultrasound interpretations are often influenced by both the proficiency and experience of the examiners and the inherent characteristics of the lesions. Since USG is an operator-dependent device, the experience of the examiner significantly affects the quality of the obtained images. In the present study, the performance of the model was not validated using an image batch obtained by less experienced examiners. The maximum utility of such a system lies precisely in increasing the accuracy of diagnosis for less experienced examiners.

The YOLO model demonstrated a confusion rate of 40% per lesion and 8% per images when distinguishing cysts from normal transonic structures in the liver. This was based on 126 false positive bounding boxes in a test set of 1385 images, primarily around vessels with a few around other normal transonic structures. To minimize the confusion rate, increasing the number of images with normal transonic structures in the training set can be effective. Clinically, sonographers can differentiate these structures using various techniques, such as probe angle adjustments, compressibility checks, and Doppler mode. Integrating such expertise with AI systems can further mitigate this issue. The accuracy of AI in detecting and categorizing FLLs may vary depending on ultrasound settings and equipment quality. Our study utilized images from various ultrasound machines and models, each with different settings manually adjusted by ultrasonographers. This diversity led to a broad range of ultrasound images. Due to the retrospective nature of our data collection, specific ultrasound settings such as gain, depth, and focus position were not available, limiting our ability to assess their impact on model performance. Despite this, the AI system showed robust performance across different settings, indicating its applicability in diverse clinical environments. Additionally, despite the YOLOv5 model exhibiting an approximate 5% improvement in diagnostic sensitivity for HCC compared to our previous study conducted using CNN model (increasing from 73.6% to 78.1%), we still did not attain the desired sensitivity¹⁷. We are committed to exploring methodologies aimed at enhancing the detection rate and boosting our AI performance efficacy.

Conclusion

The new AI model demonstrates excellent performances in detecting and diagnosing of malignant FLLs, including HCC and CCA, and benign FLLs on USG still images. External validation and real-time clinical performance are required to enhance its feasibility and applicability.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

Abbreviations

AI:: Artificial intelligence
CT:: Computed tomography
CCA:: Cholangiocarcinoma
FFI:: Focal fatty infiltration
FFS:: Focal fatty sparing
FLL:: Focal liver lesion
HCC:: Hepatocellular carcinoma
IQR:: Interquartile range
MRI:: Magnetic resonance imaging
SD:: Standard deviation
USG:: Ultrasonography
95% CI:: 95% Confidence interval

References

Harris, P. S. et al. Hepatocellular carcinoma surveillance: An evidence-based approach. World J. Gastroenterol. 25(13), 1550–1559 (2019).
Article PubMed PubMed Central Google Scholar
Neuzillet, C. et al. Management of intrahepatic and perihilar cholangiocarcinomas: Guidelines of the French association for the study of the Liver (AFEF). Liver Int. https://doi.org/10.1111/liv.15948 (2024).
Article PubMed Google Scholar
Khan, S. A., Toledano, M. B. & Taylor-Robinson, S. D. Epidemiology, risk factors, and pathogenesis of cholangiocarcinoma. HPB 10(2), 77–82 (2008).
Article CAS PubMed PubMed Central Google Scholar
Massarweh, N. N. & El-Serag, H. B. Epidemiology of hepatocellular carcinoma and intrahepatic cholangiocarcinoma. Cancer Control. 24(3), 1073274817729245 (2017).
Article PubMed PubMed Central Google Scholar
Singal, A. G. et al. AASLD Practice Guidance on prevention, diagnosis, and treatment of hepatocellular carcinoma. Hepatology 78(6), 1922–1965 (2023).
PubMed Google Scholar
Vendrami, C. L. et al. Imaging of biliary tree abnormalities. Radiographics https://doi.org/10.1148/rg.230174 (2024).
Article Google Scholar
Kamsa-ard, S. et al. Risk factors for cholangiocarcinoma in Thailand: A systematic review and meta-analysis. Asian Pac. J. Cancer Prev. 19(3), 605–614 (2018).
PubMed PubMed Central Google Scholar
Choi, J. H. & Ro, J. Y. Combined hepatocellular-cholangiocarcinoma: An update on pathology and diagnostic approach. Biomedicines https://doi.org/10.3390/biomedicines10081826 (2022).
Article PubMed PubMed Central Google Scholar
Fitzmorris, P. & Singal, A. K. Surveillance and diagnosis of hepatocellular carcinoma. Gastroenterol. Hepatol. (N Y). 11(1), 38–46 (2015).
PubMed Google Scholar
Singal, A. G. et al. Patient-reported barriers are associated with receipt of hepatocellular carcinoma surveillance in a multicenter cohort of patients with cirrhosis. Clin. Gastroenterol. Hepatol. 19(5), 987–995 (2021).
Article PubMed Google Scholar
Teerasarntipan, T. et al. Physician- and patient-reported barriers to hepatocellular carcinoma surveillance: A nationwide survey. Medicine (Baltimore) 101(36), e3053 (2022).
Article Google Scholar
Oren, O., Gersh, B. J. & Bhatt, D. L. Artificial intelligence in medical imaging: switching from radiographic pathological data to clinically meaningful endpoints. Lancet Digit. Health 2(9), e486–e488 (2020).
Article PubMed Google Scholar
Pesapane, F., Codari, M. & Sardanelli, F. Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine. Eur. Radiolo. Exp. 2(1), 35 (2018).
Article Google Scholar
Cao, L. L. et al. Artificial intelligence in liver ultrasound. World J Gastroenterol. 28(27), 3398–3409 (2022).
Article PubMed PubMed Central Google Scholar
Zhou, L. Q. et al. Artificial intelligence in medical imaging of the liver. World J Gastroenterol. 25(6), 672–682 (2019).
Article PubMed PubMed Central Google Scholar
Dadoun H, Rousseau A-L, Kerviler Ed, Correas J-M, Tissier A-M, Joujou F, et al. Deep Learning for the Detection, Localization, and Characterization of Focal Liver Lesions on Abdominal US Images. Radiol. Artif. Intell. 4(3):e210110. (2022).
Tiyarattanachai, T. et al. Development and validation of artificial intelligence to detect and diagnose liver lesions from ultrasound images. PLoS One. 16(6), e0252882 (2021).
Article CAS PubMed PubMed Central Google Scholar
Schmauch, B. et al. Diagnosis of focal liver lesions from ultrasound using deep learning. Diagn Interv. Imaging 100(4), 227–233 (2019).
Article CAS PubMed Google Scholar
Jiang, P., Ergu, D., Liu, F., Cai, Y. & Ma, B. A Review of Yolo Algorithm Developments. Proc. Comput. Sci. 199, 1066–1073 (2022).
Article Google Scholar
Kaltenbach, T. E. et al. Prevalence of benign focal liver lesions: Ultrasound investigation of 45,319 hospital patients. Abdom Radiol. (NY). 41(1), 25–32 (2016).
Article PubMed PubMed Central Google Scholar
Jung, H.-K. & Choi, G.-S. Improved YOLOv5: Efficient object detection using drone images under various conditions. Appl. Sci. 12(14), 7255 (2022).
Article CAS Google Scholar
Cheng, R. A survey: Comparison between Convolutional Neural Network and YOLO in image identification. J. Phys. Conf. Series 1453(1), 012139 (2020).
Article ADS Google Scholar
Umberger, R. A., Hatfield, L. A. & Speck, P. M. Understanding negative predictive value of diagnostic tests used in clinical practice. Dimens Crit. Care Nurs. 36(1), 22–29 (2017).
Article PubMed Google Scholar
Marin, J. J. G. et al. Molecular bases of chemoresistance in cholangiocarcinoma. Curr. Drug Targets 18(8), 889–900 (2017).
Article CAS PubMed Google Scholar
Vedeld, H. M. et al. Early and accurate detection of cholangiocarcinoma in patients with primary sclerosing cholangitis by methylation markers in bile. Hepatology 75(1), 59–73 (2022).
Article CAS PubMed Google Scholar
Park, Y. et al. Evaluating artificial intelligence in medicine: Phases of clinical research. JAMIA Open 3(3), 326–331 (2020).
Article PubMed PubMed Central Google Scholar
Tiyarattanachai, T. et al. Artificial intelligence assists operators in real-time detection of focal liver lesions during ultrasound: A randomized controlled study. Eur. J. Radiol. https://doi.org/10.1016/j.ejrad.2023.110932 (2023).
Article PubMed Google Scholar

Download references

Funding

This research project is supported by The Second Century Fund (C2F), Chulalongkorn University; Digital Economy and Society Development Fund, Office of the National Digital Economy and Society Commission, Ministry of Digital Economy and Society, Thailand; National Research Council of Thailand (NRCT): N42A640330; Thailand Science research and Innovation Fund Chulalongkorn University (HEA663000011). This work is partially supported by the Program Management Unit for Human Resources & Institutional Development, Research and Innovation (PMU-B) [grant number B04G640107]. The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.

Author information

Authors and Affiliations

Center of Excellence for Innovation and Endoscopy in Gastrointestinal Oncology, Division of Gastroenterology, Department of Medicine, Faculty of Medicine, Chulalongkorn University, 1873 Rama IV Road, Patumwan, Bangkok, 10330, Thailand
Roongruedee Chaiteerakij, Terapap Apiparakoon, Sasima Sukcharoen, Oracha Chaichuen & Rungsun Rerknimitr
Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
Darlene Ariyaskul, Kittipat Kulkraisri & Thodsawit Tiyarattanachai
Department of Surgery, Roi-Et Hospital, Roi-Et, Thailand
Phaiboon Pensuwan
Image Processing and Understanding Team, Artificial Intelligence Research Group, National Electronics and Computer Technology Center, Pathum Thani, Thailand
Sanparith Marukatat

Authors

Roongruedee Chaiteerakij
View author publications
You can also search for this author in PubMed Google Scholar
Darlene Ariyaskul
View author publications
You can also search for this author in PubMed Google Scholar
Kittipat Kulkraisri
View author publications
You can also search for this author in PubMed Google Scholar
Terapap Apiparakoon
View author publications
You can also search for this author in PubMed Google Scholar
Sasima Sukcharoen
View author publications
You can also search for this author in PubMed Google Scholar
Oracha Chaichuen
View author publications
You can also search for this author in PubMed Google Scholar
Phaiboon Pensuwan
View author publications
You can also search for this author in PubMed Google Scholar
Thodsawit Tiyarattanachai
View author publications
You can also search for this author in PubMed Google Scholar
Rungsun Rerknimitr
View author publications
You can also search for this author in PubMed Google Scholar
Sanparith Marukatat
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R. Chaiteerakij T, P Pensuwan, R. Rerknimitr, and S. Marukatat conceptualized and designed the study. S. Sukcharoen, O. Chaichuen, P Pensuwan performed data acquisition. R. Chaiteerakij, D. Ariyaskul, K. Kulkraisri T. Apiparakoon, T. Tiyarattanachai, and S. Marukatat, performed data analysis and interpretation. D. Ariyaskul, K. Kulkraisri wrote the first draft of manuscript. R. Chaiteerakij revised the manuscript. R. Chaiteerakij, P. Pensuwan, R. Rerknimitr, and S. Marukatat supervised the study. All authors read the manuscript and approved its final form.

Corresponding author

Correspondence to Roongruedee Chaiteerakij.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Tables.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Chaiteerakij, R., Ariyaskul, D., Kulkraisri, K. et al. Artificial intelligence for ultrasonographic detection and diagnosis of hepatocellular carcinoma and cholangiocarcinoma. Sci Rep 14, 20617 (2024). https://doi.org/10.1038/s41598-024-71657-z

Download citation

Received: 25 September 2023
Accepted: 29 August 2024
Published: 04 September 2024
DOI: https://doi.org/10.1038/s41598-024-71657-z
Springer Nature Limited

Artificial intelligence for ultrasonographic detection and diagnosis of hepatocellular carcinoma and cholangiocarcinoma

Abstract

Similar content being viewed by others

Artificial intelligence (AI) models for the ultrasonographic diagnosis of liver tumors and comparison of diagnostic accuracies between AI and human experts

A multicenter clinical AI system study for detection and diagnosis of focal liver lesions

Preoperative classification of primary and metastatic liver cancer via machine learning-based ultrasound radiomics

Explore related subjects

Introduction

Methodology

Dataset

Image selection

Image allocation

AI model

Experimental setup

Data pre-processing

Training and tuning process

Performance evaluation

Detection task

Diagnosis task

Statistical analysis

Results

Dataset characteristics

Performance of AI system

Malignant and Benign FLLs.

Types of FLLs

Discussion

Conclusion

Data availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Tables.

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation