Introduction

The term artificial intelligence (AI) commonly refers to the computational technologies that mimic or simulate intellectual processes typical of human cognitive function, such as reasoning, learning, and problem solving [1]. AI is a branch of computer science and part of a multidisciplinary approach adopting principles from the fields of mathematics, logic, computation, and biology in an attempt to build intelligent entities often represented as software programs [2, 3]. Given its broad, dynamic, and expanding computational power, AI has been revolutionizing and reshaping our health-care systems, allowing physicians to improve their ability to perform medical tasks [2]. As the medical community´s understanding and acceptance of AI grows, so does our imagination in ways to improve diagnostic accuracy, expedite clinical processes, and decrease human resource costs by assisting medical professionals in what once were time-consuming problems [4].

Machine learning (ML) is a subfield of AI involving the development and deployment of dynamic algorithms to analyze data and facilitate the identification of intricate patterns [5]. ML tends to improve or ‘learn’ as more data are incorporated by using decision trees to explicitly learn decision rules [6]. Recent advancements in AI have also been driven by deep learning (DL), which involves the training of artificial neural networks (ANN) with multiple layers on large datasets [7]. An ANN is a collection of individual information processing units or “artificial neurons” that are arranged and interconnected in network architectural layers to perform computational tasks and recognize complex patterns [8]. They are trained with input/output tuples where each input has a specific output assigned to arrange concepts or functions beyond the means of traditional statistical analysis methods. A frequently used ANN that is particularly efficient when applied to pattern recognition in digitized images is the deep convolutional neural network (DCNN). The depth and width of the network determine the complexity and ‘learning potential’ of the network. In health care, ML and DL have been increasingly and successfully applied to preventive medicine, image recognition diagnostics, personalized medicine, and clinical decision-making.

The aim of this review article is to address recent ML and DL applications in urolithiasis, renal cell carcinoma, bladder cancer, and prostate cancer to predict patient outcomes. Their utilization other than in these subfields is not in the scope of this work.

Methodology

A comprehensive review of current literature was performed using the PubMed-Medline database up to May 2019 using the term “urology”, combined with one of the following terms: “machine learning”, “deep learning” and “artificial neural network” in combination with “urolithiasis”, “renal cell carcinoma”, “bladder cancer”, and “prostate cancer”. To capture recent trends in ML and DL applications, the search was limited to articles published within the last 5 years, originally published in English. Review articles and editorials were excluded. Publications relevant to the subject and their cited references were retrieved and appraised independently by two authors (R.S. and A.M.). In accordance with the PRISMA criteria, Fig. 1 was included to delineate our article selection process. After full text evaluation, data were independently extracted by the authors for further assessment of qualitative and quantitative evidence synthesis. The following information was extracted from each study: name of author, journal and year of publication, AI method, number of participants per study, and outcome prediction accuracy.

Fig. 1
figure 1

Summary of study selection process

Urinary stone disease

Despite novel instrumental advancements in urinary stone surgery, decision-making and patient counseling remain a challenge for clinicians. Several investigators have studied the prognostic role of preoperative parameters on surgical treatment outcomes in terms of stone-free rate (SFR) and the need for secondary procedures [9,10,11,12,13]. An accurate preoperative outcome prediction would assist urologists in optimizing patient selection, choosing ideal treatment options, and personalize patient counselling.

Kadlec et al. developed an ANN that predicted outcomes after various forms of endourologic intervention [9]. Input variables and outcome data from 382 endourologically treated renal units were used to assess SFR (defined by no visible stone on KUB or < 4 mm on CT) and need for secondary procedures. The model predicted SFR with 75.3% sensitivity and 60.4% specificity, and the need for a secondary procedure with 30% sensitivity and 98.3% specificity, yielding a positive and negative predictive value of 60% and 94.2%, respectively. Aminsharifi et al. trained an ANN with pre- and postoperative data from 200 patients and used it to predict various outcomes for 254 patients after percutaneous nephrolithotomy (PCNL) [10]. The accuracy and sensitivity to predict SFR, blood transfusion, and post-PCNL ancillary procedures ranged from 81.0 to 98.2%. Stone burden and morphometry received the highest weight by the ANN as preoperative characteristics affecting postoperative outcomes. Recently, Choo et al. developed and validated a decision-support model using ML algorithms to predict treatment success after a single-session shock wave lithotripsy (SWL) in ureteral stone patients [11]. Using data from 791 patients, a model constructed with 15 variables exhibited 92.3% accuracy to predict SWL outcome. In the decision tree analysis, stone volume, length, and Hounsfield units were the top three most important preoperative variables. Similarly, Seckiner et al. collected data from 203 patients and developed an ANN to predict SFR and support SWL treatment planning [14]. ANN analysis demonstrated a prediction accuracy of 99.3% for SFR in the training group, 85.5% in the validation group, and 88.7% in the test group.

Other studies have investigated computer-assisted detection using image features for supporting radiologists in identifying stones. Längkvist et al. developed a DCNN to differentiate ureteral stones from phleboliths in thin slice CT volumes due to their similarity in shape and intensity [15]. The DCNN was evaluated on a database consisting of 465 clinically acquired abdominal CT scans of patients suffering from suspected renal colic. The model achieved 100% sensitivity and an average of 2.68 false positives per patient on a test set of 88 scans. Kazemi et al. derived an ANN for the early detection of kidney stone type and most influential parameters to provide a decision-support system [16]. Information pertaining to 936 patients who underwent treatment for kidney stones was collected and included 42 image features. The model resulted in 97.1% accuracy for predicting kidney stone type and identified gender, calcium level, uric acid condition, hypertension, diabetes, nausea/vomiting, flank pain, and urinary tract infection as the most vital parameters for predicting the chance of nephrolithiasis.

A future common goal is for ANNs to be exchanged between institutions to overcome the limitation of having networks trained with data from just one center. To this extent, ML and DL methods hold promise for multi-institutional dataset expansion in national registries and the development of future predictive nomograms. Table 1 provides a summary of studies using ML and DL methods applied in various modalities of urolithiasis therapy.

Table 1 Machine and deep learning applications in urolithiasis

Renal cell carcinoma

The incidence of renal cell carcinoma (RCC) has steadily increased over the past decades as a result of incidental small renal mass (SRM) detection via cross-sectional imaging [17]. Surgical series have shown that 20–30% of SRMs ≤ 4 cm are benign, while 20% exhibit potentially aggressive behavior [18]. However, there are currently no clinical or radiographic features that accurately predict histologic analysis. Magnetic resonance imaging (MRI) and computed tomography (CT) have been employed in an attempt to noninvasively differentiate tumors by their degree, pattern, and heterogeneity of enhancement. While promising, these approaches remain suboptimal as clinical tools for differentiating SRMs. Recently, powerful ML algorithms are being used to explore complex interactions in clinical and imaging data to provide diagnosis, prognosis, treatment planning and assist in shared decision-making.

Given the limitations of conventional medical imaging, there has been increasing interest in radiomics, which involves automatically extracting quantitative features from medical images. Radiomics may provide a novel approach to develop predictive tools by correlating imaging features to tumor characteristics including histology, tumor grade, genetic patterns and molecular phenotypes, as well as clinical outcomes in patients with renal masses. Pixel distribution and pattern-based texture analysis have emerged as practical quantitative methods to build image processing algorithms for the detection of tissue differences that cannot be determined by subjective visual assessments [19].

Several studies have shown that texture analysis has potential in differentiating SRM [20,21,22]. Yan et al. showed that texture analysis may be a reliable quantitative strategy to differentiate between angiomyolipoma (AML), clear cell RCC (ccRCC), and papillary RCC (pRCC) with an accuracy in the range of 90.7–100% based on the analysis of three-phase CT scans [23]. Feng et al. achieved a higher accuracy and area under the curve (AUC) of 93.9% using a similar ML strategy [24]. Cui et al. proposed an automatic computer-identification system to differentiate AML from whole-tumor CT images using an over-sampling technique to increase the sample volume of AML [25]. Yu et al. evaluated the utility of texture analysis for the distinction of renal tumors, including various RCC subtypes and oncocytoma. The ability of ML to distinguish ccRCC and pRCC from oncocytoma was excellent with AUC of 0.93 and 0.99, respectively [22]. Coy et al. investigated the diagnostic value and feasibility of a DL-based renal lesion classifier to differentiate ccRCC from oncocytoma in 179 patients with pathologically confirmed renal masses on routine four-phase multiple detector CT scans [26]. When using the entire tumor volume, the excretory phase showed the best classification performance with 74.4% accuracy, 85.8% sensitivity, and PPV of 80.1%.

Furthermore, the nuclear grade of a tumor is widely recognized as one of the most important independent prognostic factors [27]. Determination of the Fuhrman grade by percutaneous renal biopsy suffers from significant sampling bias, making the preoperative recognition of biological aggressiveness challenging. Studies have shown that ML models constructed from CT imaging texture features can accurately distinguish between ccRCC high and low grades, with accuracy ranging from 0.73 to 0.93 [19, 28,29,30,31,32]. Ding et al. showed high prediction accuracy in identifying ccRCC grade and their results were superior to those obtained from CT image features or the RENAL nephrometry score for high- and low-grade ccRCC predictions [29].

In recent years, biomarkers and multiple gene expression-based signatures have been developed to predict survival and disease prognosis in ccRCC. Li et al. developed a prognostic model based on 15 survival-related genes from The Cancer Genome Atlas and showed that patients in the model´s high-risk group had significantly worse survival than those in the low-risk group. Risk group was independent of age and sex, but was significantly associated with hemoglobin level, primary tumor size, and grade [33]. Radiogenomics is a field investigating the potential associations between a disease’s imaging features and the underlying genetic patterns or molecular phenotype. Kocak et al. evaluated the potential of quantitative CT scan texture analysis to predict the presence of PBRM1 mutations, which is the second most commonly identified mutation in ccRCC, using ANNs and ML algorithms [34]. Overall, the ANN correctly classified 88.2% of ccRCC with regard to PBRM1 mutation status, while the random forest ML algorithm correctly classified 95% of ccRCC.

These are promising results for developing noninvasive imaging biomarkers of histopathologic subtypes, prognosis, and treatment response. Moreover, they demonstrate that noninvasive ML and DL models constructed from radiomics features have comparable performance to percutaneous renal biopsy in predicting the International Society of Urological Pathology (ISUP) grading. Accurate preoperative nuclear grading may substantially aid in risk assessment, patient stratification, and treatment planning of RCC patients. Table 2 summarizes the most significant findings of MDL applications in the field of RCC.

Table 2 Machine and deep learning applications in renal cell carcinoma

Bladder cancer

The diagnosis and tumor staging of bladder cancer (BCa) ultimately depend on cystoscopic examination of the bladder and histological evaluation of sampled tissue by transurethral resection (TURB). The main limitation of cystoscopy is its difficulty in discriminating between areas of malignancy and healthy urothelium given the multifocal nature of the disease and inconspicuous but significant lesions such as CIS. However, CT/MRI image-based 3D texture feature analysis of the bladder wall has demonstrated its potential as a noninvasive, image-based strategy to accurately identify heterogeneous tumor distribution and preoperatively discriminate BCa from normal wall tissue [35]. MRI textural features extracted from cancerous volumes of interest and incorporated into ML models have further demonstrated their ability to preoperatively distinguish low- and high-grade BCa with 83% accuracy [36]. DCNNs have also been used to classify and predict cystoscopic findings with a high degree of accuracy [37]. Such a DL model can be integrated into an AI-aided imaging diagnostic tool to support urologists during cystoscopic examinations. ‘AI cystoscopy’ may serve as an adjunct during surgical training and medical education to help differentiate benign from malignant lesions using visual evaluation and thereby reduce the number of unnecessary biopsies. A different approach for image-based diagnosis has focused on the nanoscale-resolution scanning of cell surfaces collected from urine [38]. Atomic force microscopy coupled to ML analysis has been shown as a noninvasive method to detect BCa with 94% accuracy when five cells per patient’s urine sample are examined. Moreover, it demonstrated a statistically significant improvement in diagnostic accuracy compared to cystoscopy alone. ML-based methods have been further applied to accurately quantify tumor buds from immunofluorescence-labeled slides of muscle-invasive BCa (MIBC) patients [39]. Tumor budding was found to correlate with TNM staging and patients of all stages were stratified into three new staging criteria based on disease-specific death. Tumor bud quantification through automated slide analysis may provide an alternate staging model with prognostic value for MIBC patients.

ML algorithms have been employed to create recurrence and survival predictive models from imaging and operative data. Patient recurrence and survival at 1, 3 and 5 years after cystectomy was predicted with greater than 70% sensitivity and specificity [40]. Such predictive models may help dictate patients´ follow-up schedules, adjuvant treatments, and also provide opportunities for improving care by optimally utilizing operative data collection. ML algorithms used to identify genes at initial presentation that are most predictive of recurrence can be applied as molecular signatures to predict the risk of recurrence within 5 years after TURB [41]. Whole genome profiling from frozen non-muscle-invasive BCa specimens was integrated into a genetic programming algorithm to generate classifier mathematical models for outcome prediction. The model identified 21 key genes that are associated with recurrence from which an optimal three-gene rule [TMEM205 × (NFKBIA × KRT17)] was developed to predict recurrence with 70.6% sensitivity and 66.7% specificity on the test set.

An unmet need in BCa treatment is the early assessment of chemotherapeutic efficacy and prediction of treatment failure at an early phase to reduce unnecessary morbidity, improve patients´ quality of life, and reduce costs. Therefore, the development of accurate predictive models to determine the effectiveness of neoadjuvant chemotherapy is of critical importance in BCa management. Computerized decision support systems (CDSS) have been developed to provide noninvasive, objective, and reproducible decision support for identifying non-responders, so that treatment may be suspended early to preserve their physical condition or to distinguish full responders for organ preservation. Wu et al. compared the performance of different DCNN models and showed that they effectively predicted a bladder lesion´s response to chemotherapy and compared favorably to radiologists´ performance [42]. Cha et al. developed a CT-based CDSS to improve the identification of patients who responded completely to neoadjuvant chemotherapy and found that physicians´ diagnostic accuracy significantly increased with the aid of CDSS [43]. Thus, computer-aided treatment prediction using DL algorithms may prove to be invaluable to medical professionals as a decision support tool for improving the selection of patients considering bladder-sparing therapy for MIBC and avoiding adverse effects in non-responders.

Despite several ML and DL research efforts in predicting BCa patients’ outcomes, there is scarce adoption of such models in clinical practice. The main challenges ahead before such models can be deployed successfully in a clinical setting are the inclusion of standardized parameters, adjusting for the equipment variance, and the collection of multi-institutional data to ensure the generalizability of the models. Once these issues are addressed, ML and DL models can be trained using BCa datasets to accurately predict an individual patient´s outcome using pre-, peri-, and postoperative data. Table 3 summarizes the most significant findings of MDL applications in the field of BCa.

Table 3 Machine and deep learning applications in bladder cancer

Prostate cancer

There is an unmet need for definitive diagnosis besides transrectal imaging and biopsy for men with prostate cancer (PCa). Although a biopsy is necessary for a conclusive diagnosis, patients with low cancer risk could avoid this procedure due to the potential complications that may arise. To achieve this goal, prediction models have been developed to determine patients’ cancer risk on the basis of clinical characteristics. Multilayer ANNs have predicted patients’ prostate biopsy results more accurately while assessing large numbers of variables than traditional statistical methods ranging from non-linear relationships to logistic regression [44, 45]. Despite MRI having improved PCa detection and thereby reducing the number of unnecessary biopsies, excessive variation in its performance and interpretation is a major barrier for global standardization. Computer-aided diagnostic (CAD) systems with DL architecture have been applied to diminish variation in the interpretation of prostatic MRI. Among the advantages of this approach are consistent diagnoses, cost-effectiveness, and improved efficiency. Ishioka et al. developed DCNN algorithms that estimate the area in which a targeted biopsy may detect the presence of cancer and in its execution decrease the number of patients mistakenly diagnosed as having cancer [46]. However, other studies have shown no added benefit of radiomic ML when compared with mean apparent diffusion coefficient in differentiating benign versus malignant prostate lesions [47]. Provided that the diagnostic precision of CAD systems exceeds that attained by humans, and the pathological diagnosis can be predicted with high accuracy, it may be reasonable to confirm clinically significant PCa solely based on MRI images rather than with biopsy.

Automated computational methods applied to digital pathology images have shown the ability to overcome Gleason score ambiguity, convey reproducible results, and generate large amounts of data. Arvanati et al. trained a DCNN as Gleason score annotator and used the model’s predictions to assign patients into low-, intermediate-, and high-risk groups, achieving pathology expert-level stratification results [48]. Accurate post-surgical risk stratification is essential to identify patients at high risk of PCa-specific mortality who would benefit from early intervention. Donovan et al. introduced an innovative platform which accurately discriminates between low-, intermediate-, and high-risk PCa, and predicts the likelihood of significant clinical failure within 8 years [49]. By combining ML-guided image analysis with biological attributes, the authors provided a risk assignment that is unbiased, broadly applicable, and independent of interpretive histology.

While information from clinical registry data assists physicians to make data-driven decisions, there is limited opportunity for patients to access these registries to help them make informed decisions. Auffenberg et al. utilized data from a prospective cancer registry comprising 7543 men diagnosed with PCa to train an ML model to help newly diagnosed men to view predicted treatment decisions based on patients with similar characteristics [50]. Their personalized model was highly accurate with age, followed by number of positive cores and Gleason score resulting as the most important variables that influenced patient treatment decisions.

Treatment response prediction using MRI images has been shown as an efficient clinical decision-making tool. Abdollahi et al. developed various radiomics models based on pre- and post-intensity-modulated radiotherapy (IMRT) MRI data for individualized treatment response prediction in PCa patients [51]. Their results showed that the features extracted from pre-treatment MRI images predicted early IMRT response with reliable performance. Moreover, Hung et al. presented a novel ML method of processing automated performance metrics to evaluate surgical performance and predict clinical outcomes after robot-assisted radical prostatectomy (RARP) [52]. Their model predicted length of hospital stay, operative time, Foley catheter duration, and urinary continence with over 85% accuracy [53]. In a recent study, Wong et al. used three ML algorithms for the prediction of early biochemical recurrence and showed with an AUC > 0.95 to outperform traditional statistical regression models [54]. Such methodology can be employed as potentially more accurate for identifying patients at risk and equip patients and physicians alike with prognostic information to provide individualized health care (Table 4).

Table 4 Machine and deep learning applications in prostate cancer

ML and DL limitations

AI technologies have been attracting substantial attention in urology; however, their real-life implementation still faces obstacles. Several limitations exist in most studies applying ML and DL methods to urological diseases. First, the variability in study design, algorithms employed, training features used, and observed end points make it difficult to perform quantitative analysis. Second, most algorithms in these studies were validated with their dataset; therefore, they lack external validation and the generalizability of their results across other datasets is not applicable. Third, further algorithm development and research are particularly required in the field of urolithiasis to outperform conventional statistical methods as observed in urooncological investigations to reduce procedural costs and maximize patient outcomes. Lastly, some studies did not compare AI with conventional statistical analysis, since these methods only allow a limited number of training features, whereas AI can process big data and can thus be trained with a greater number of training features. For this reason, a comparison between any two techniques is challenging [55].

Future directions

Future research should focus on the construction of larger medical databases and further development of AI techniques. Once developed, the use of improved algorithms should not require large computer centers, but be performed on mobile devices or by access to cloud services. Specialized AI-based software for image-guided, real-time, intraoperative decisions will require appropriate regulatory approvals to function with robotic platforms and expand to operating rooms worldwide [56]. Issues remain regarding the trustworthiness of a computer´s diagnosis and that programming biases do not interfere with diagnoses. Human intuition, experience, and common sense will remain to play a crucial role in future AI developments to ensure that these systems are operating as intended and to deal with undesired consequences in a timely fashion.

Conclusion

The predictive precision of ML and DL will continue to provide and enhance personalized medicine with the further inclusion of data and model retraining. Larger patient datasets and electronic medical records can be semi-automated to provide instant predictive analytics that can be used to obtain insights into a variety of disease processes. Predictive accuracy, however, is highly dependent on efficient data integration obtained from different sources to enable it to be generalized. Although the shared decision-making will not be replaced by these models, it may complement the information patients obtain from traditional methods. While this is the beginning and further validation is required, there are limitless future applications for artificial intelligence in the field of urology.