Introduction

Breast cancer is the most frequently diagnosed tumor in women worldwide and the second most common cause of death in women (Ferlay et al. 2012). Tumor size and lymph node status are the two main independent and additive prognostic factors (Carter et al. 1989). The 5-year survival rate in patients with lymph node metastases ranges from 98.9 to 85.7% for localized and regional stage, respectively (Surveillance et al. 2020). Accurate assessment of the axillary lymph nodes plays an important role in patient prognosis and treatment planning, guiding the initial surgical approach, choice of neoadjuvant chemotherapy (NAC), and subsequent axillary management (Caudle et al. 2014).

Axillary lymph node dissection (ALND) is related to high complication rate, such as infections, seromas, axillary paresthesias and lymphedema (Lucci et al. 2007) and, for this reason, the surgical goal is to perform the most conservative procedure. In patients with clinically negative lymph nodes, sentinel lymph node biopsy (SLNB) is the standard method for axillary staging (). According to the Z0011 trial of the American College of Surgeons Oncology Group (ACOSOG), ALND is not necessary in patients who are candidates for conservative surgery with clinically negative axillary lymph nodes and a maximum of two metastasized lymph nodes detected by SLNB, because the 10-year survival of patients who underwent SLNB alone was comparable to and not less than that of patients who underwent ALND (Giuliano et al. 2017).

However, the complications of SLNB are the same as ALND, although less frequent (Lucci et al. 2007). In addition, SLNB has a higher rate of false negatives, especially on assessment after NAC, ranging from 12.6 to 14.2%, respectively (Giuliano et al. 2017; Kuehn et al. 2013).

Considering the rate of surgical complications and false negatives, it would be useful to predetermine axillary lymph node status before surgery to reduce overtreatment rate, improving the patients quality of life and reducing the risk of loco-regional and distant recurrences.

In recent years, a growing interest in the field of radiomics has been expressed in the literature. Radiomics, originally developed by Lambin et al. (2012), represents the extraction of a large amount of quantitative information called “features” from medical images (such as CT, MRI, or PET) that are then converted into quantifiable data and analyzed by software. Radiomics has been applied in studies for not only prostate, liver, rectal, lung, but also breast cancer with excellent preliminary results (Sun et al. 2019; Cozzi et al. 2017; Horvat et al. 2018; Thawani et al. 2018).

Radiomics is also applied to capture differences in tumor phenotypes and the tumor microenvironment through the combination of imaging features with other types of data (including clinical, treatment-related, or genomic data) to support therapeutic decision-making (Lambin et al. 2012).

In the field of breast cancer, the impact of analysis of radiomics-derived data on lesion diagnosis, prediction of NAC response, risk of recurrence and disease-free survival have already been investigated (Li et al. 2016; Fan et al. 2017a, b; Leithner et al. 2019; Antropova et al. 2017). However, there are only few studies about the prediction ability of radiomics regarding lymph node status in breast cancer and no specific reviews. Only Dong-Man et al. explored the state of the art of the application of radiomics in breast MRI including the prediction of axillary lymph node metastasis in a less lymph node-specific review (Ye et al. 2020).

The purpose of this review is to investigate the potential role of radiomics as a decision support tool in predicting lymph node status in breast cancer patients.

Materials and methods

Two reviewers conducted the search, selected the studies, and extracted data from each study independently. From a total of 30 papers, 10 research articles on predicting lymph node metastasis in breast cancer using a radiomic approach in breast MRI were considered eligible.

MEDLINE databases, such as PubMed and Web of Science, were searched using the following keywords: “breast cancer” AND “lymph node” AND (“radiomics” OR “radiomics”). No limitations were applied to the search strategy. Case studies, abstracts, reviews, letters to editors, editorials, and comments were excluded. Diagnostic modalities other than MRI were excluded. Only publications in English were included. We completed the search by manually reviewing the bibliography of all selected articles.

The quality of the methodology was evaluated according to the RQS (Radiomics Quality Score) as described by Lambin et al. (2017). This score is useful to evaluate the quality of articles concerning radiomics through the evaluation of 16 criteria; each criterion was assigned a different maximum score in relation to its importance. The two reviewers assigned, in accordance, the RQS to the selected studies in absolute values and percentage (maximum value of 36, representing 100%).

The following data were extracted from each study: title, authors, publication year and journal, study design (retrospective or prospective), number of patients, MRI technical information (magnetic field strength and sequences), software used to perform segmentation and feature selection, number and type of radiomics features considered, algorithms used for the classification and resulting accuracy.

Results

Our search found 30 publications on predicting axillary lymph node status in BC patients. All these studies were published from 2017 to 2020. Ten studies were retrospective in design (Chai et al. 2019; Liu et al. 2019a, b; Cui et al. 2019; Dong et al. 2018; Han et al. 2019; Tan et al. 2020; Shan et al. 2020; Yu et al. 2020); only Liu et al. (2020) was prospective.

All the ten study characteristics, as recorded by the reviewers, are shown in Table 1.

Table 1 Characteristics of the studies on lymph node and tumor MRI radiomics included in the review

Chai et al. (2019) compared the different discriminative abilities of both pre-contrast and post-contrast MR imaging sequences, showing that the combination of CE2 features and kinetic features had the highest performance and that preoperative radiomic signatures of primary BC were associated with ALNM.

Liu et al. (2019a) demonstrated that ALNM could be predicted using DCE-MRI-based radiomics by applying three different classifiers (SVM, logistic regression, and XGboost), finding that SVM results based on the strongest enhanced DCE-MRI images gave the best classification efficacy (accuracy of 0.85 and AUC of 0.83).

The SVM showed the best accuracy also in the paper by Cui et al. (2019) (89.54) in comparison to KNN and LDA. In their work, the combination of morphological and texture features had the highest performance. In addition, a nomogram that scored morphological and texture features to calculate the probability of ALNM was established.

Dong et al. (2018) were the first to predict ALNM status in BC patients using radiomics based on T2-FS and DWI sequences, demonstrating that the predictive performance of features derived from T2-FS and DWI combined (AUC: 0.805) outperformed those of T2-FS and DWI when taken independently (0.770 and 0.787, respectively).

A nomogram incorporating the radiomic signature, MRI-reported LN status and LN palpation was developed by Han et al. (2019). The nomogram showed better results than the radiomic signature alone and a radiomic signature to distinguish the number of metastatic LNs was also investigated (less than two positive nodes/more than two positive nodes).

Liu et al. (2019b) evaluated the first attempt to combine DCE-MRI radiomic features with clinic-pathological features to improve the predictive performance of ALNM, citing that the predictive performance was comparable to Dong et al. They reported that DWI is not available in all hospitals and T2-FS sequences alone are not as high performing as DCE-MRI.

Tan et al. (2020) predicted ALNM in BC based on T2-FS images alone and established a nomogram incorporating radiomic signatures with clinic-pathological features using a linear regression model (AUC: 0.805), also plotting calibration curves to assess the consistency between the probability of ALNM predicted by the nomogram and the actual results.

Shan et al. (2020) validated a nomogram model to detect ALNM in patients with invasive BC, which incorporated the kinetic curve model and only five radiomic features extracted from DCE-MRI with a high AUC of 0.86.

Patients with early-stage BC were investigated by Yu et al. (2020) which developed and validated a clinical-radiomic nomogram that successfully stratified these patients according to their risk of ALNM. In addition, they defined a nomogram to provide individualized prediction of ALNM and risk of disease recurrence in patients with early-stage BC.

There is only one work that combines radiomic and hemodynamic features to improve preoperative prediction model for ALNM status by including quantitative parameters (such as Ktrans and Kep) to allow estimation of angiogenesis and tumor proliferation, by Liu et al. (2020).

The mean RQS score calculated was 11.1 (maximum possible value = 36). The RQSs and the criteria by which the studies scored lowest are shown in Table 2. In five studies, multiple segmentation was performed (Liu et al. 2019a, b, 2020; Dong et al. 2018; Yu et al. 2020). In all studies, feature selection was performed to avoid the curse of dimensionality. Six studies calculated a multivariable analysis with non-radiomic features and discussed biological and radiological correlations (Han et al. 2019; Liu et al. 2019b, 2020; Tan et al. 2020; Shan et al. 2020; Yu et al. 2020). Han et al. and Yu et al. evaluated the clinical importance of applying radiomics by determining the decision curve analysis (Han et al. 2019). Only Tan et al. (2020) plotted the calibration curves to evaluate the consistency between the nomogram-predicted probability of ALNM and the actual surgical outcomes. In Shan et al. and Liu et al., validation was based on datasets from different institutions, two and four, respectively. The studies had the lowest RQ scores in the following criteria: study type (only Liu et al. was prospective in design), validation, comparison with a gold standard, potential clinical utility, economic analysis, and open science data (none of the studies made their datasets open source, although Liu et al., Dong et al. and Shan et al. allow access to the datasets upon explicit request).

Table 2 Overview of the Radiomic Quality Score (RQS) according to Lambin et al. (Ye et al. 2020) obtained by the studies taken into examination, and the criteria in which the studies scored the lowest points

Some bias were described by the authors in the discussion paragraphs. In most cases, the small samples size and the monocentricity were considered the major limitations. The population investigated was composed by less than 200 patients for seven studies (Chai et al. 2019; Liu et al. 2019a, b, 2020; Cui et al. 2019; Dong et al. 2018; Shan et al. 2020). Yu et al. used the largest sample size (over 1000 patients) to develop and validate DCE-MRI radiomic signatures for preoperative identification of ALNM. Eight studies lacked external validation for their models (Shan et al. 2020; Yu et al. 2020).

Discussion

Radiomics is an emerging field of relatively recent development and application. The consequence, as shown in this review, is the low number of studies in the current literature, with only ten articles available regarding ALNM status prediction in breast MRI, all published after 2017 and mostly between 2018 and 2020. In addition, nine out of ten studies are retrospective in design and three include fewer than 120 patients. Dong-Man et al. (Ye et al. 2020) have already written a rapid review about the relationship between radiomics and breast cancer, including the prediction of axillary lymph node status, however, in our study, we preferred to more deeply investigate this topic, updating the results with recent studies and assessing the quality of the literature.

A lymph node is considered clinically suspicious if at least one of the following criteria is present: palpability on physical examination of the axilla, suspicious features on imaging, and/or positivity on cyto-histological examinations after fine needle biopsy, core-needle biopsy or SLNB/ALND. Clinical examination of the axillary cavity is associated with a high false negative rate, reaching values of 45% (Sacre 1986). Regarding imaging methodologies, the main one for the study of the axilla after the diagnosis of breast cancer, during NAC or follow-up, is the US examination, which allows the evaluation of the I, II and III lymph node level. Considering only the size criteria, US sensitivity and specificity range from 49 to 87% and 55 to 97%, respectively (Alvarez et al. 2006). MRI and PET-CT are the most accurate methods for studying supraclavicular and internal mammary lymph node stations, with MRI performing better than PET-CT (Liang et al. 2017). MRI is also the primary imaging modality to evaluate BC response and axillary involvement after NAC (Scheel et al. 2018). The gold standard for axillary staging is, however, represented by surgical approach (SLNB and ALND): if SLNB is negative, no further surgical evaluation is required (Giuliano et al. 2017). Although complications in both methods are decreasing over the years, there is currently no complication-free method to stage the axilla.

For this reason, several non-invasive methods are being investigated. Radiomics, as quantitative information extraction and analysis, is a very promising new approach for the prediction of axillary metastases, but still limited due to its technical and methodological complexity and, therefore, not yet applied to routine clinical practice.

This review focuses on the current status of the radiomic approach in predicting lymph node status based on MRI for breast cancer.

Among all the available imaging techniques, MRI is non-invasive, generally well-accepted and routinely used in clinical practice for BC staging. For axillary staging, MRI is unaffected by patient habitus and less operator-dependent than US, demonstrating the best visualization of the entire axilla (Baltzer et al. 2011).

Typical MRI suspicious features include: lymph node size, morphologic features (cortical thickening, loss of fat hilum, irregular margins, and round shape), presence of peripheral edema and asymmetry with the contralateral axilla (Baltzer et al. 2011). Through the combination of MRI pure information routinely used in breast cancer staging and radiomic features, unnecessary ALND and SLNB could be avoided, with a reduction of surgical complication rate and improvement of patient's quality of life. In addition, radiomic signatures applied to MRI could be a step toward more personalized and precision medicine in treatment strategies for breast cancer patients. Some studies have also developed nomograms, that are graphical representations of the mathematical relationship between prognostic factors, both clinical and diagnostic, including radiomic features, to visually aid the clinical decision process (Balachandran et al. 2015).

Among breast MRI sequences, DCE is considered the best for primary tumor identification and characterization, although the diagnosis of metastatic lymph nodes by MRI is still not ideal. However, the current literature indicates promising results in terms of predicting metastatic axillary lymph nodes using MRI radiomic signatures in breast cancer patients, particularly using DCE sequences. Currently, DWI sequences are rarely used alone in radiomics studies because of relatively low image quality and spatial resolution, leading to less quantitative data availability. Among all the studies analyzed by this review, one includes DWI sequences with T2-FS as pre-contrast sequences (Dong et al. 2018), one is based only on T2-FS images (Tan et al. 2020) and two studies combined DWI and T2-FS images with DCE sequences (Chai et al. 2019; Yu et al. 2020).

Considering the different DCE phases evaluated, Chai et al. show that the second phase of post-contrast imaging (CE2) performed the best. It is acquired 60–90 s after contrast administration, achieving the most relevant contrast between tumor and background (Chai et al. 2019). This is also supported by other studies using CE sequences (Liu et al. 2019a, b, 2020; Cui et al. 2019; Han et al. 2019; Shan et al. 2020; Yu et al. 2020).

Feature extraction of radiomics usually requires prior segmentation. Although studies in which ROIs are manually drawn by radiologists may be more prone to error and user variability, prediction is still good, despite being a time-consuming method. It would be ideal to develop a reliable and validated automatic segmentation method, but it is not yet available.

A critical issue concerns the quality of study results, which is currently relatively low. According to Lambin et al. (2017), the quality of the results is assessed using the 16 RQS criteria, with a maximum total score of 36. The average RQS score, obtained from the analysis of the papers considered by this review, is 11.1 (30.1%), indicating modest overall quality. The most critical points are the type of study, the detection and discussion of biological correlations, the validation, the comparison with a gold standard, the potential clinical utility, the economic analysis and the open science data (details are given in Table 2).

The study design, year of publication and sample analyzed by the various studies are depending on the recent lymph node radiomics application to BC field. Even more, the researches are characterized by an extreme variability in the software employed for feature selection and extraction as well as the number and type of classifiers applied. The absence of standardization of the applied methodologies reduces or limits the results reproducibility. Interestingly, in this regard, all articles were published in scientific medical imaging journals.

Even more, in the analyzed papers, there are some important bias concerning the number of features selected, in some cases too high, as in the work of Dong et al. (> 10,000 features) (Cui et al. 2019), risking overfitting and redundancy, and the pre-processing phases, such as manual segmentation, with results reproducibility reduction.

This review has some important weaknesses. The first is the low number of articles analyzing lymph node status in breast cancer, with all studies but one being retrospective in design. The population numbers are quite low for most studies, although this is in accordance with their experimental nature. However, we expect that, as the application of radiomics progresses and becomes more widespread, there will be an increase in the number of cases examined, also considering the possibility of dataset sharing.

Second, the results are not homogeneous, which leads to low reproducibility of the studies. There is still no standardized approach, for example in terms of the software used (some of which are produced “in house”) or the type and number of features analyzed. This weakness is reflected in the modest quality of the studies, as described by the RQS, especially in the areas of discussion of potential clinical utility and correlation between radiomic information and biological data.

In conclusion, new prospective studies with larger sample sizes and focused on the clinical aspect are needed to thoroughly investigate the behavior of BCs in relation to axillary involvement and their radiomic mirroring. It remains clear that radiomics results on axillary lymph node status predictability obtained so far are definitely encouraging. Dedicated libraries, open-access datasets and interactive comparison between different algorithms are required to test the accuracy of the results and allow for their generalization. The results of the review indicate the direction of future research: new works should be conducted using DCE-MRI, particularly CE2, introducing automation systems for the pre-processing (segmentation) phase and improving feature selection.