Introduction

In urological diagnostics and surgery planning, a profound understanding of the anatomical structure of the upper urinary tract (UUT), including the ureter and renal pelvis, is crucial. Many imaging techniques, such as intravenous urography (IVU), computed tomography urography (CTU), and magnetic resonance urography (MRU), can reveal the anatomical structure of the UUT. However, each of these imaging examinations has its own drawbacks. IVU and CTU require intravenous injection of contrast media and have the risk of phlebitis, allergic reactions, and contrast-induced nephropathy1,2,3. In addition, CTU often cannot achieve complete visualization of UUT 4,5,6. Magnetic resonance urography (MRU) is noninvasive, but its image quality relies on the water signal and it has rather poor image quality for normal kidneys and ureters without hydronephrosis7.

Non-enhanced CT (NECT) scan is noninvasive, does not require contrast media, and can be applied on almost all patients except pregnant women. It can also reveal the anatomical structure of UUT8. In fact, it is possible to distinguish the outline of the renal pelvis and ureter on a thin-layer non-enhanced CT scan most of the time. Hence, it is also possible to perform three-dimensional (3D) reconstruction of the UUT on NECT scans. However, at present, it cannot replace invasive techniques such as IVU and CTU. First, distinguishing the ureter from adjacent structures on NECT scans is extremely tricky without contrast media. Second, although experienced radiologists can identify the ureter on a non-enhanced CT scan, it is a time-consuming and labor-intensive task. Currently, radiologists and urologists cannot obtain 3D structures of UUT through a NECT scan; they still need invasive examinations such as CTU and IVU to fulfill their needs9.

In recent years, deep learning (DL) models based on artificial neural networks have been extensively employed in medical imaging, demonstrating promising results. DL models can perform different tasks such as lesion detection, organ segmentation, and diagnosis10,11,12,13,14. We believe that DL models can also perform urinary tract segmentation on NECT scans with proper design and adequate training. Considering the tiny volume of the UUT (usually less than 0.1% of all voxels in a CT scan), and the significant structure difference between renal pelvis and ureter, the challenges of a low ground truth share and structural differences pose serious obstacles for existing DL models in accurately segmenting the UUT15,16.

In this paper a DL-based method was proposed to tackle aforementioned issues. Utilizing image cropping and sectional training, the proposed method effectively alleviated the influence of low ground truth share and structural differences. Extensive experiments demonstrated that the proposed method could perform end-to-end UUT segmentation and achieved comparable accuracy with radiologists in many cases. Therefore, the method studied in this paper is a desirable segmentation approach, which could remarkably reduce the work load of urologists and save many patients from invasive exams17.

Materials and Methods

Dataset

All experimental protocols of this study were approved by the ethical committee of the Second Hospital of Shandong University (No. KYLL-2023-088, March 6 2023). All methods were carried out in accordance with relevant guidelines and regulations. Informed consent was obtained from all subjects. The flowchart of case selection of the train, validation and test sets is shown in Fig. 1. We searched the picture archiving and communication system (PACS) database of our hospital for NECT scans of the abdomen and pelvis from January 2022 to December 2022 and acquired 2334 entries. The inclusion criteria were as follows: (1) thin-layer NECT scans (1.25-mm slices) that covered the kidneys, ureter, and bladder; (2) the left renal pelvis and the entire length of the left ureter clearly distinguished; and (3) normal anatomical structure of the ureter and pelvis. The exclusion criteria were as follows: (1) the ureter and renal pelvis could not be fully and clearly distinguished; (2) morbidity affecting the UUT, such as renal mass, renal cyst, hydronephrosis, lithosis, and so forth; and (3) anatomical variations, such as ureteral stenosis, duplicated ureters, ampullary renal pelvis, and so forth. After initial selection, 150 scans were selected for further study. A radiologist with more than 10 years of experience marked the structure of the left ureter and renal pelvis on each CT scan and saved the markings as mask files. The CT scans and their corresponding masks were deprived of all private information. They were randomly divided into two datasets: training set (130 cases) and validation set (20 cases). The demographic data of the training and test sets had no significant difference (Table 1).

Fig. 1
figure 1

Flowchart of case selection of the train, validation and test data. NECT: non-enhanced CT; UUT: upper urinary tract; CTU: CT urography.

Table 1 Demographic data of each group.

We searched the PACS database of our hospital for CTU scans from July 2022 to October 2022 and acquired 120 entries to test the accuracy of our models. The inclusion and exclusion criteria were the same as for the training set, but with one addition: each case must have thin-layer scans of the kidney, ureter, and bladder in both non-enhanced and delayed phases. A total of 29 cases were selected for further study. For each case, the non-enhanced scan was used to test trained models. The same radiologist performed left pelvis and ureter reconstruction using delayed-phase scans and marked the structure of the left ureter and renal pelvis on non-enhanced scans. These masked non-enhanced scans were assigned to the test set. All UUT masks were reviewed by a professional urologist with extensive experience of urologic imaging. Any reading disagreement was resolved by consensus reading. Two radiologists with more than 10 years of experience independently reviewed the results of different DL models on the test set and calculated the recall and false positive rate of each model. Any reading disagreement was resolved by consensus reading with senior radiologists and urologists. The corresponding CTU reconstructions were used as comparison. All UUT masks were smoothed with Gaussian filter before training or testing. The inputs for all models were raw data without windowing or normalization. The training data used in this study would be available from the corresponding author upon reasonable request.

All NECT and CTU scans in this study were performed on a single General Electric Discovery CT 750 HD scanner. All NECT and CTU scans used the same protocols. CTU scans used intravenous injection with 2 mL/kg iohexol and a 10-min delayed scan. The tube voltage was 120 kV, and the tube current ranged between 300 and 400 mA. All CT scans had a horizontal resolution of 512 × 512. Each series had 260–553 axial slices.

Models

To improve accuracy and reduce training costs, an adaptive framework was proposed. As the Fig. 2 shows, before the raw NECT scan was input into the model, the image was cropped to a 240 × 112 × 112 array containing the left UUT, according to the left kidney position detected by the pre-trained DL model18. On this basis, two types of segmentation frameworks were elaborated. (A) entire: training a single model that took the entire 3D array as the input (Fig. 2a) and (B) sectional: separating the 3D array into two sections, with the upper one-third containing the renal pelvis and upper part of the ureter and lower two-third containing the lower part of the ureter; then two independent models were separately trained to process the upper part (80 × 112 × 12) and the lower part (160 × 112 × 112) respectively (Fig. 2b).

Fig. 2
figure 2

Different frameworks in this study. a. entire: one model took the entire 3D array containing UUT as input; b. sectional: two models processed upper 1/3 and lower 2/3 of the input respectively, and their outputs were combined to form the final output. In order to give better visual depiction, the 3D images in this figure were reconstructed from CTU scan.

To evaluate the segmentation performance of different DL models and construct effective segmentation method, three deep learning models were employed in aforementioned frameworks: (A) basic UNet model based on 3D convolution (Fig. 3a); (B) UNet3 + model also based on 3D convolution but with modified skip connections19 (Fig. 3b); and (C) ViT-UNet model combining convolution and vision transformer20,21 (Fig. 3c). these models were trained with the training set, the training process was monitored with the validation set. The models with the highest F1 score \(\left( {\frac{{2 \times {\text{precision}} \times {\text{recall}}}}{{{\text{precision}} + {\text{recall}}}}} \right)\) were saved and tested with the test set. All models were coded with Python 3.9.0 and PyTorch 1.13.1. The training workstation used 64-GB RAM and 24-GB GeForce RTX 4090.

Fig. 3
figure 3

Schematic diagram of DL models used in this study. (a) Basic UNet model; (b) UNet3 + model; (c) ViT block. The ViT-UNet model in this study had the same structure with basic UNet model (a), but with E5 replaced by 12 ViT blocks shown in (c). E: encoders; D: decoders; ViT: vision transformer; MSA: multi-head self-attention; MLP: multi-layer perceptron; arrows: data flow; dotted lines: skip connections.

Statistical analysis

After training, both frameworks with different models were tested with the test set. We compared the precision, recall, and Dice coefficient of each method. The method with the best performance was chosen for comparison with the CTU scans. When comparing with CTU, the axial recall of the ureter was calculated, representing the number of horizontal planes in which DL model (or CTU) successfully identified the ureter or renal pelvis. Additionally, the axial false-positive rate was calculated to indicate the number of horizontal planes in which the DL model mistook other structures for the ureter or renal pelvis. Statistical analysis was carried out with IBM SPSS 26.0. Statistical methods used in this study included student t test and chi-square test.

Results

The overall precision and recall of different methods are depicted in Table 2. The sectional framework with basic UNet model exhibited the highest precision and recall among all methods tested, with an overall precision of 85.5% and recall of 71.9%. Additionally, all models performed better in the sectional framework than in the entire framework.

Table 2 Test results of the three trained models.

In the result comparison of the best DL method and CTU scans in the test set (Table 3), the overall axial recall of the DL model and CTU scan was 82.2% and 69.1%, respectively; the difference was statistically significant (P < 0.01). In this study, the DL model achieved more than 85% axial recall in 13 cases, whereas CTU achieved higher than 85% recall in 9 cases. In three cases, the DL model had lower than 60% axial recall. In 18 cases, the DL model exhibited greater than 10% false-positive rate. To exclude the impact of false positives on axial recall, we introduced a novel metric called 'recall without ambiguity’. Under this metric, only axial planes without false-positive dots were considered valid. The DL model exhibited a 'recall without ambiguity' of 69.4%, showing no significant difference compared to CTU (69.1%, P = 0.27).

Table 3 Comparison of DL model and CTU scan in the test set.

To test the robustness of our model, we tested the basic UNet model with different morbidities. Small renal or ureteral stones seemed to have minimal impact on our model, nor did small renal cyst and renal tumor. The model also had no difficulty recognize ureter and renal pelvis with slight hydronephrosis. However, its recall of ureter could be drastically decreased by larger ureteral stones (diameter > 1 cm) and severe hydronephrosis. The model’s recall of upper UUT could also be affected by the deformation of kidney caused by large renal cysts or renal tumors.

Discussion

In this study, we used NECT scans with manually labeled left renal pelvis and ureter as the training set, and trained different DL models to perform UUT segmentation on NECT scan. The test result showed promising results. The DL models could achieve 85% precision and 71% recall on NECT scans with a training set of 130 cases. The accuracy was not ideal, but it is crucial to consider that the identification of UUT on NECT scans was a challenging task. First, the ureter often adhered to other structures, such as the psoas major muscle, iliac artery, and uterus; the density of all of them was similar to that of the ureter (Fig. 4a,b). Sometimes, even most experienced radiologists could not accurately distinguish the outline of the ureter. Second, in NECT scan, the density of the ureter was only slightly higher than the background noise (surrounding fat and connective tissue) (Fig. 4c). Third, the ureter always runs parallel with seminal vessels (ovarian vessels in women). They have nearly identical density and appearance on NECT scans (Fig. 4d)22. Considering all these difficulties, we believed that the current performance of our DL models was acceptable and the false-positive rate was within tolerance23. The result of this study demonstrated that it was possible and practical to use DL models to perform automatic UUT reconstruction with NECT scans24. It can save many patients from invasive exams and their complications, effectively reducing healthcare cost. Although current models still had some limitations, this method has great potential for improvement in the future25. The DL models reported in this study were only trained with UUT with normal appearing, and just as expected, they had rather poor result in the robustness test with abnormal samples. W.l.o.g., these models would have no difficulty in recognizing the right UUT or UUT with morbidity through adequate training and larger dataset.

Fig. 4
figure 4

The difficulty of recognizing ureter on NECT scan. (a) barely visible ureter (arrow) adheres to psoas major muscle; (b) barely visible ureter (arrow) adheres to iliac artery; (c) very thin ureter (arrow) blended into surrounding fat; (d) ureter (yellow arrow) runs parallel with ovarian vein (white arrow) which has almost identical appearance.

As the segmentation result discussed in above section shows, the simplest basic UNet model achieved higher accuracy than more complex models. We speculated that the low fine-grained precision of basic UNet actually made it more suitable for the UUT segmentation task because the precise boundary of UUT on NECT scans was not so clear. That could be why the UNet3 + model, which emphasized fine-grained precision, had worse performance26,27. Theoretically, the combination of convolution and vision transformer in the ViT-UNet model should have a better overall vision of the entire input image28. However, in this research, the UUT structure has tiny volume, and the low correlation with other pixels in the same slice, so it can be regarded as a semantic high-frequency signal, and ViT model has limited performance on this problem29. As the experimental results showed, all models performed better under sectional framework. This was partially because the structures of the renal pelvis (thin-slice renal pelvis with multiple horn-shaped calices) and the ureter (a thin, curvy tube) were significantly different. In this case, training different models for more specific tasks could improve efficiency30. The other reason could be the limitation of our training platform31,32. With the same amount of memory usage, training in two parts with smaller inputs could enable DL models of more convolution channels and attention heads to capture more input characters33.

The comparison with CTU scans showed that the DL model could achieve a higher recall of UUT than CTU. The smooth muscle of the ureter exhibits constant peristalsis, with a high chance that the contrast cannot fill the entire UUT34 (Fig. 5a1). In addition, the timing for the pyelographic-phase scan was hard to control35. Too early a scan would leave a large portion of the ureter non-opacified (Fig. 5b1). The DL models do not rely on contrast but distinguish UUT by image features21. This study included many cases in which CTU could not visualize the distal ureter, but the DL model recognized ureter segments that CTU failed to visualize (Fig. 5a,b). Our DL model demonstrated a higher recall but occasionally suffered from a high false-positive rate. Hence, the DL model could serve as a valuable complement to CTU.

Fig. 5
figure 5

Comparison of CTU reconstruction (1), model output (2) and ground truth on NECT (3). (a) Lower part of the ureter missing in CTU (a1) and full-length visualization using the DL model (a2). (b) Large portion of the ureter missing on CTU due to bad timing of the scan (b1) and higher recall of the ureter using the DL model (b2), though with a few false-positive dots and relatively poor pelvis visualization. (c) Full-length ureter visualization using CTU (c1) and poor DL model output with multiple false-positive points and missing segments (c2).

Limitations

Our test results showed that current DL models still had some limitations (Fig. 6). (A) The result of sectional framework sometimes had break points at the joining part (Fig. 6a1). (B) They occasionally mistook adjacent seminal vessels (ovarian vessels in women) for the ureter (Fig. 6b). (C) They had difficulty recognizing the ureter at the site where the ureter ran across the iliac artery (Fig. 6c). (D) In female patients, the DL models had difficulty recognizing the ureter where it ran near the uterus (Fig. 6d). (E) DL models had poor recall and high false-positive rate in patients with too much intestinal gas or inflammatory secretions (Fig. 5c and Fig. 6e). (F) The model’s recall could be affected by morbidities like large renal cysts, renal tumors and stones. For (A), (B) and (C), we believed that adjusting the model structure could be helpful21. Increasing the number of training sets could also help ameliorate these problems. For (D) and (E), no effective solutions existed, as even the most experienced radiologists could not distinguish the ureter on NECT under these circumstances. That was one reason why the DL models could not replace CTU. For (F), we believe further training with more sample variety could solve this problem, as a DL model could only recognize what it had already ‘seen’.

Fig. 6
figure 6

Limitations of the current DL segmentation model. Examples are shown in the 3D view (1) and CT images (2). Red marks are UUT masks generated by the DL model. (a) Missing ureter segment at the joining part of the two models (circle). (b) Misidentification of ovarian vessels as the ureter (arrows in 3D view). Left ureter (arrow in coronal view) and ovarian vein on its left are shown in the coronal view. (c) Missing ureter segment and false-positive dots near the iliac artery (circle). The missed ureter segment at the model-joining part was also seen in the 3D view. (d) Missing ureter segment in female patients (circle). The ureter beside the uterus is barely visible (arrow). Misidentification of the ureter at the model-joining part was also seen in the 3D view. (e) Missed ureter segment (circle in 3D view) caused by intestinal filling (arrows in coronal view).

Conclusions

In summary, the proposed DL-based method is feasible and achieved promising performance in recognizing the ureter and renal pelvis on NECT scans. These segmentation models can save many patients from invasive examinations and can serve as a valuable complement of CTU. These models can also reduce the work load and improve the efficiency of radiologists. Although the current models had many limitations, we believed that they hold significant potential for improvement through additional training with more diverse samples. In addition, this work also extends the technical framework of DL in clinical medical research.