Introduction

Breast cancer is the most common cancer in women worldwide [1]. To reduce breast cancer mortality, a screening program is essential for identifying early and small breast cancers. Mammography (MG) is the only breast cancer screening modality that has been shown to decrease mortality and is widely adopted across the world [2]. Ultrasound (US) screening has attracted attention recently due to its ability to improve breast cancer detection rate in young Asian women, who often have dense breasts [3]. Magnetic resonance imaging (MRI) has proven useful in populations with high risk of breast cancer [4, 5]. The field of breast imaging has undergone transformative improvements since the inception of MG for screening purposes in the early 1960s. Spatial and temporal resolutions have improved remarkably in all imaging modalities. The most significant change has been in digitalization. Digitalization allowed much more information to be preserved and shared, and facilitated image processing. The images are treated not only as pictures but also as data.

With improvements in molecular biology techniques, diagnosis based on biological and pathological factors would be possible. A correct therapeutic decision is guided by a detailed diagnosis that includes receptor status, proliferation index, nodal status, or immunocyte activity. Although tissue sampling specimens have traditionally contributed to this information, images can play an essential role in whole-tumor evaluation considering the heterogeneity of breast cancer. Imaging technology using artificial intelligence (AI) has developed into a noninvasive method to provide biological and pathological information.

Artificial intelligence in breast imaging

AI is a vast field that analyzes data and makes inferences based on knowledge, judgment, problems, and solutions. Technologies used in breast imaging research can be divided into two types, namely machine learning (ML) and deep learning (DL) (Fig. 1). ML is a subset of AI in which computers are trained perform functions without being explicitly programmed by humans to complete those tasks. ML commonly uses features and inputs from human programmers as the basis for learning. With ML, relatively good models can be created, even in small-scale studies with fewer than 1000 target images. Various ML and learning methods have been proposed one after another to create highly accurate prediction models. DL increases the number of layers of ML algorithms to perform more complex and extensive data analyses [6]. Convolutional neural networks (CNN) are the most commonly used DL tools for image-based diagnosis. The CNN approach has resulted in breakthroughs in image processing, including breast imaging, over the last few decades. Therefore, many image research and clinical applications use CNNs to perform clinically meaningful tasks, such as classification, segmentation, and detection.

Fig. 1
figure 1

Diagram explaining the relationships in different techniques in the AI field. AI artificial intelligence, ML machine learning, DL deep learning, CNN convolutional neural network

Breast density and risk prediction

Breast density reflects the amount of fibroglandular tissue in the breasts. It on MG is an independent risk factor for breast cancer [7]. Evaluation of breast density is important to recognize the breast cancer risk and the possibility that noncalcified lesions may be masked in the fibroglandular tissue. The European Society of Breast Imaging (EUSOBI) recommends that women be informed about their breast densities and contrast-enhanced breast MRI be conducted in women aged 50–70 years with extremely dense breasts [8]. Although several guidelines exist for evaluating breast density, the most widely used method uses the density score from American College of Radiology (ACR) Breast Imaging Reporting and Data System (BI-RADS) [9]. The ACR BI-RADS Atlas 5th edition classifies breast density into 4 categories, namely a: “The breasts are almost entirely fatty”; b: “There are scattered areas of fibroglandular density”; c: “The breasts are heterogeneously dense, which may obscure small masses”; d: “The breasts are extremely dense, which lowers the sensitivity of mammography”. These 4 categories are associated with breast cancer risk [10]. However, variability remains across radiologists, and consistency is low, even among individual radiologists [11] . Studies using AI have demonstrated beneficial results in terms of both conformity and reproducibility. The DL model demonstrated excellent classification accuracy for dense and non-dense breasts [12]. It exhibited good agreement with the density assessments of an experienced radiologist. Another DL model revealed that breast density estimation can be assessed using any type of mammography, including full-field digital mammography (FFDM), digital breast tomosynthesis (DBT), and synthesized 2D mammograms [13]. Although inter- and intra-reader variability is obvious, visual categorization models are commonly used to estimate the models. Gastounioti et al. stated that this discrepancy is mainly due to the lack of large datasets with ground-truth density estimations [14]. Many AI applications, such as Quantra™ (Hologic, USA), IntelliMammo® densityai™ (Densitas, Canada), and Volpara TruDensity® (Volpara Imaging, New Zealand), are already commercially available. A study comparing mammographic assessments in the available models showed a strong association between breast cancer risk and automatically measured breast density [15]. Interestingly, visual density assessment demonstrated a strong relationship with cancer, despite the known inter-observer variability.

Some studies have directly predicted the risk of developing breast cancer. An accurate prediction of breast cancer risk is required for personalized screening. Several breast cancer risk prediction models using personal health data, such as age, race, hormone usage, and prior cancer history, have been proposed and investigated in randomized trials [16, 17]. Mirai model [18], a DL-mammography-based risk model, indicated that a DL model with mammographic features added to the previous risk prediction model could increase the accuracy of cancer risk prediction.

Cancer detection

Ideal breast cancer screening should have high sensitivity and specificity, without invasion or cost. Although MG is most widely used around the world, its sensitivity and specificity are insufficient, especially for dense breasts [19]. Several types of US, MRI, and other modalities have been used to detect early-stage breast cancer till date [20, 21]. However, AI algorithms for cancer detection could achieve highly accurate screening and reduce the burden on interpreters. Computer-aided detection (CAD) software for mammography was introduced in the 1990s [22]. The Food and Drug Administration (FDA) approved the first commercial CAD system as a second opinion for screening mammography in 1998. However, CAD increased recall rates [23], and there was no evidence that CAD when applied to digital MG significantly improved screening performance [24]. Instead, many studies have been published on AI algorithms that demonstrated excellent performance in breast cancer detection. McKinney et al. used an AI-based breast cancer detection algorithm, which was trained on larger and more representative datasets in the United Kingdom and the United States, and proved its performance to be better than that of radiologists (AUC: 0.81–0.89) [25]. Other retrospective studies with large European populations confirmed the usefulness of AI-based cancer detection systems [26]. Although many studies have demonstrated the potential of AI in providing highly accurate screening and reducing the workload of MG interpreters, there are several limitations. Most of these studies were retrospective and included small populations. Therefore, prospective randomized controlled trials have had a significant impact in this field. The Mammography Screening with Artificial Intelligence (MASAI) trial, a randomized controlled trial, revealed that AI-supported MG screening resulted in a similar cancer detection rate compared to standard double reading, with a lower screen-reading workload [27]. A large number of subjects and many years would be required to demonstrate the usefulness of screening image-based diagnoses using AI. However, previous reports had shown positive results when AI was introduced in the medical examination business. In future, AI may be commonly used to reduce the burden on image readers.

Diagnosis and characterization of cancer

When the area or lesion of interest is identified, whether it is benign or malignant is ascertained, and which subtypes of breast cancer are needed. If AI can provide information with high accuracy, unnecessary biopsies can be avoided, treatment preparations can be quickly initiated, and therapy effect monitoring can be promptly performed. Conventionally, the morphological characteristics related to benign or malignant lesions are recognized. For example, a spiculated margin or linear calcification is highly suggestive of a malignancy. These morphological characteristics are supported by verification in a large number of previous cases. Radiomics is a method for analyzing not only the features visible to the human eye, but also those derived from precise calculations (Fig. 2). Radiomics treats images as data and extracts tens to hundreds of types of “radiomic features” (Fig. 3). Radiomic features can be divided into three representative groups, namely morphological, histogram, and texture. In addition, transforming images using techniques, such as wavelet or Fourier transformation, can increase the number of features. To calculate the radiomic features, complex calculations are performed using the pixel values of each pixel in the image. Radiomics comprehensively analyzes the features collected in these ways and uses ML or DL to predict the clinical or histopathological features. In research using radiomics, studies have used US [28] and MRI [29] to determine whether a lesion is benign or malignant. The studies that used ML demonstrated high accuracy. A contrast-enhanced mammography-radiomics model was used to predict the breast cancer characteristics [30]. The accuracies of whether hormone receptor positivity or negativity was 95.6%, and of the tumor grade was 77.8%.

Fig. 2
figure 2

Radiomics workflow in breast imaging

Fig. 3
figure 3

Segmentation and examples of radiomics features

There have also been reports on predicting lesion characteristics using ML or DL without using radiomics. Herent et al. used a DL model that utilizes MRI to identify lesions, determined whether the lesion was benign or malignant, and classified their histological subtypes [31]; the overall Area under the curve (AUC) was 0.817. Fleury et al. used five ML methods to determine whether lesions were benign or malignant using ultrasound images. In this study, support vector machine (SVM) showed the highest AUC of 0.840 [32].

Since new targeted therapies have been developed for specific molecular subtypes, an appropriate therapeutic choice would improve therapeutic response. The breast is an organ that is relatively easily approached for biopsy. Therefore, if images are used to simply determine whether they are benign or malignant, or to classify subtypes of breast cancer, the accuracy would be equivalent to that of a biopsy, and more safety and simplicity can be achieved. However, unlike a biopsy, imaging has the advantage of being able to evaluate an entire lesion. Breast cancer is a clinically and biologically heterogeneous disease [33]; if we can verify the differences in treatment effects and prognosis due to the biological diversity of tumors, the significance of diagnostic imaging will increase dramatically.

Treatment response

Neoadjuvant systemic therapy (NST) can reduce tumor size and allow minimally invasive surgery. If the therapeutic effect can be predicted before starting an NST, it can greatly contribute to drug selection and be useful in precision medicine. Although magnetic resonance imaging (MRI) is the most accurate modality for determining the response to NST, its determination occurs after NST [34]. In recent research, AI has shown the potential to predict NST responses beforehand. In a study using MG, the DL model predicted pathological complete response (pCR), represented by an area under the curve (AUC) of 0.71 [35]. In a small prospective study using MRI prior to and after two cycles of NST, the high accuracy of several ML models was reported [36]. A radiomics study using pre-treatment ultrasound and digital breast tomosynthesis (DBT) demonstrated that a multimodal algorithm significantly improved the assessment of response to NST than an algorithm using only clinical variables [37]. In addition, a radiomics model using only pre-treatment T2 non-contrast images predicted responder or non-responder status with an AUC of 0.87 [38]. There are several limitations to each study that used AI. However, AI approaches have a significant advantage in predicting the response to NST using images before the therapy, or with images without contrast agents. These studies demonstrated the potential of AI in clinical precision medicine.

Breast cancer prognosis

Following the diagnosis of breast cancer, the most important application of the radiomic AI model would be in the prediction of breast cancer prognosis. Conventionally, some clinical features have been used as predictive factors, such as age, tumor size, axillary lymph node (ALN) metastasis, hormone receptor status, human epidermal receptor 2 (HER2) status, and Ki-67 index. ALN status is critical for predicting disease-free survival and overall survival in patients with breast cancer [39]. Sentinel lymph node biopsy (SLNB) is a widely accepted method that provides an accurate diagnosis of axillary lymph node metastasis and avoids unnecessary axillary lymph node dissection (ALND) [40, 41]. Although SLNB is less invasive than ALND, it is associated with complications. Therefore, AI-based trials that predict ALN metastasis using noninvasive methods have been conducted. The ML model using MG [42] and the DL model using US images [43] and MRI [44] predicted appreciable ALN metastasis. A CNN using MRI divided the patients into three Oncotype DX Recurrence Score groups, with an accuracy of 0.81%. [36] An ML model using MRI predicted disease-specific survival with an AUC of 0.83 [45]. Radiomics showed that addition of radiomics to the conventional radiological prediction workflow improves the prognostic value of breast imaging. A prospective study [46] using radiomics features extracted from pre-treatment US images identified recurrence with an accuracy of 82%. Radiomics features using MRI were associated with disease-free survival in patients with breast cancer [47].

Several approaches have been used, based on clinical characteristics or multigene assays, for prognosis prediction. AI-based prediction models that use breast images acquired in clinical settings may achieve repeatable and cost-effective decisions.

Perspectives on AI in clinical breast cancer imaging

Various AI models exist in breast imaging diagnosis, from those already in commercial use to those still in the research stage. AI models are already being used to assist radiologists in their diagnoses. They are known to be useful, but it will take some time before AIs that can replace radiologists are developed because several imaging studies using AI have been conducted for breast cancer. Most of these were retrospective studies, and large-scale prospective studies would be required in future to evaluate the usefulness and reproducibility of the developed AI. The most important factor in creating a highly accurate AI model is to acquire a large number of high-quality images and associated clinical information. In breast cancer treatment, diagnosis is often performed using a combination of several imaging modalities, such as mammography, ultrasound, and MRI. An AI diagnosis or interpretation of results that combine modalities is likely to be necessary in future. Diagnosing based on a combination of modalities requires more case data than diagnosing based on a single modality. To handle such large amounts of data, a system for sharing data, including image information, would be necessary. Image databases and program codes published online are useful for AI image research; however, the data sources are still insufficient. In addition, each imaging equipment manufacturer has its own platform; however, the environments in which it can be used are limited. More data and knowledge should be shared to implement better AI models in future. By having AI perform simple image-based diagnosis, radiologists may be able to concentrate on diagnosing each patient based on optimal image interpretation.

Conclusion

Research on breast cancer images using AI has been conducted widely, ranging from risk prediction to breast cancer prognosis. The demand for diagnostic imaging is likely to increase because of the associated advantage of observing the entire lesion. Additionally, minimally invasive imaging modalities, such as ultrasound or non-contrast MRI, may attract more attention than ever before. We look forward to AI models with reproducibility and robustness and an environment that makes them easy to use.