Introduction

Prostate cancer (PCa) is the second most frequent cancer and the fifth leading cause of cancer death among men worldwide [1]. Management of PCa includes active surveillance, radical prostatectomy, androgen deprivation therapy, and radiation therapy, which is individually divided according to each patient’s factors such as expected survival, prostate-specific antigen (PSA) level, Gleason score (GS), and the presence or absence of metastasis [2, 3].

GS is a significant parameter on risk stratification of PCa, and reflects a cancer’s aggressiveness. GS ≥ 7 is considered to indicate clinically significant cancer (CSC). Recently, there have been studies on the selection of optimal treatment modalities for low-risk PCa (GS < 7), in which context, active surveillance is increasingly used [4, 5]. Therefore, it is crucial to discriminate patients with clinically insignificant cancer for active surveillance from those with CSC for radical prostatectomy using non-invasive diagnostic modalities.

Since Prostate Imaging Reporting and Data System version 2 (PI-RADSv2) was established, there has been much effort to detect CSC using magnetic resonance imaging (MRI). This system has shown a maximum accuracy of 82% at a threshold of ≥ PI-RADSv2 value 4 [6]. To date, bi-parametric MRI consisting of T2-weighted images (T2WI) and diffusion-weighted images (DWI) has emerged as an alternative to multi-parametric MRI for evaluation of PCa, and has shown a comparable diagnostic performance to that of multi-parametric MRI for detection of CSC [7, 8].

Texture analysis is a mathematical model that extracts, from an image, features that are not perceptible to the human eye [9,10,11]. Several studies have employed texture analysis on MRI as an imaging biomarker to assess the aggressiveness of PCa, and have shown promising results [12,13,14,15,16,17,18,19]. Those studies have focused on either T2WI [12, 14, 16] or the apparent diffusion coefficient (ADC) map [15, 18]. However, only a few studies on bi-parametric MRI using T2WI and ADC map have been conducted. Moreover, to the best of our knowledge, there are only three studies that have distinguished CSC from clinically insignificant cancer using texture parameters based on bi-parametric MRI [13, 17, 19].

Therefore, the aim of this study was to investigate the association between texture parameters based on bi-parametric MRI and GS in patients with PCa and to evaluate diagnostic performance of any significant parameter for discriminating CSC from non-CSC.

Methods

This retrospective study was approved by the relevant institutional review board, and informed consent was waived.

Patients and selection criteria

For a training dataset, between January 2017 and February 2019, a total of 267 patients were histologically confirmed as prostate adenocarcinoma by radical prostatectomy or non-targeted systematic biopsy. Among them, 111 patients who fulfilled the inclusion criteria were enrolled in the training dataset. The inclusion criteria were as follows: (a) patients having undergone preoperative or post-biopsy 3T-MRI and (b) patients with data on GS. Among the 111 patients, 46 were excluded for the following reasons: (a) previous treatment (hormone therapy, radiation therapy or radical prostatectomy) for PCa (n = 20), (b) marked metallic artifact on MR images (n = 2), (c) transurethral resection of the prostate (n = 8), (d) no identified focal lesion on T2WI and DWI (n = 16). Finally, 65 patients (mean age: 69 years, range 51–88 years) were enrolled in the training dataset (Fig. 1).

Fig. 1
figure 1

Flowchart showing case accrual process in the training dataset

For a validation dataset, 51 consecutive patients (mean age: 69 years, range 47–82 years) who were confirmed as prostate carcinoma by radical prostatectomy between March 2019 and April 2020 were recruited.

MR imaging

All prostate MRI scans were performed in a 3.0-T MR machine (Achieva; Philips Medical imaging, Best, Netherlands) with a parallel-array body coil (SENSE Torso/cardiac coil; USA Instruments, Gainesville, FL, USA).

The imaging protocol included axial, coronal and sagittal T2-weighted turbo spin-echo sequences [repetition time (TR)/echo time (TE), 2322/90 ms; echo train length (ETL), 15; slice thickness, 3 mm; slice gap, 0.3 mm; matrix size, 316 × 255; number of excitations (NEX), 1; field of view (FOV), 220 × 220]. Diffusion-weighted single-shot echo planar imaging (b = 0 and 1000 s/mm2; TR/TE, 5725/78 ms; ETL, 73; slice thickness, 3 mm; slice gap, 0.3 mm; matrix size, 120 × 118; NEX, 1; FOV, 220 × 220) was performed in the axial plane parallel to the axial T2WI. An ADC map was automatically calculated from b values of 0 and 1000 s/mm2.

MR texture analysis

First, two radiologists with 15 and 3 years of experience in reading prostate MRI reviewed the T2WI, ADC map, and DW images and determined the location, border of each tumor and PIRADSv2 score. These decisions were made by consensus after referencing the topographic map of resected specimens in cases of radical prostatectomy or, if there was no available topographic map, the systematic biopsy results in cases of no surgery. The radiologists had been informed of the presence of PCa but were blinded to the GS of each tumor. When multiple lesions were found in a single patient, the largest one was regarded as the index tumor.

The MR DICOM images of ADC map and axial T2W images were transferred from a picture archiving and communication system (PACS) workstation (m-view; INFINITT healthcare, Seoul, Korea) to a workstation equipped with in-house software (Medical Imaging Solution for Segmentation and Texture Analysis, MISSTA, Seoul, Korea) that performs fully automated quantification of texture features using a dedicated C++ language (Microsoft Foundation Classes; Microsoft, Redmond, WA, USA) [20,21,22,23,24]. For tumor segmentation, the radiologist with 3 years of experience who had participated in the tumor localization manually drew the regions of interest (ROIs) along the tumor border on each section of the ADC map to cover the entire tumor volume using the software’s 3-dimensional measurement tool. Definite areas of fat, vessel, bowel, seminal vesicle, or urinary bladder were excluded from the ROIs; however, areas of necrosis were included in the ROIs to reflect the heterogeneity of the tumor [25]. Similarly, the ROIs of axial T2WI were drawn with reference to the ADC map. The measurement was repeated to obtain intra-observer agreement by the abovementioned radiologist. To evaluate inter-observer agreement, another radiologist with 4 years of experience performed the task with the same manner. Both radiologists were blinded to the results of each other.

After the tumor segmentation, we obtained a total of 53 texture features that the software automatically provided. The texture features consisted of first-order statistics (mean, variance, standard deviation, skewness, kurtosis, entropy, and homogeneity) and second-order statistics based on the gray-level co-occurrence matrix (GLCM) (moments, angular second moment [ASM], inverse difference moment [IDM], contrast and entropy), the gray-level run-length matrix (GLRLM), and wavelet transformation features.

Histopathologic analysis

The non-targeted systematic biopsy system was applied to all biopsy specimens, and 12 core biopsy specimens were obtained. All of the radical prostatectomy procedures were carried out by dedicated urologists. After the radical prostatectomy, the excised prostate gland containing tumor tissue was fixed in formalin and serially sectioned from the apex to the base. After paraffin embedding, slides were stained with H&E. Each slide was assessed by a dedicated pathologist in accordance with the Gleason scoring system [25]. A total score is calculated based on how cells look under a microscope, with the first half of the score based on the dominant and the second half based on the non-dominant cell pattern with the highest grade. The pathologist recorded the GS from the largest lesion as an index tumor. A topographic map was drawn by connecting the tumor borders from each slide.

Statistical analysis

All of the statistical analyses were performed using MedCalc software for Windows (MedCalc Software version 12.7.1.0, Mariakerke, Belgium). A P value less than 0.05 was considered significant. First, the correlation coefficient was calculated to evaluate the association between the texture parameters derived from the T2WI and ADC map, respectively, and the GS. Subsequent multiple regression analyses were performed with those significant parameters (correlation coefficient ≥ 0.3 or ≤ − 0.3) from the T2WI and ADC map to determine the most significant predictor among them. To assess the difference of texture parameters between the CSC and non-CSC groups, the Mann–Whitney U test or t test was performed. To extract an optimal cut-off value for prediction of CSC, a receiver operating characteristic (ROC) curve analysis was performed. The area under the curve (AUC) was calculated to evaluate the diagnostic performance of the texture parameters’ prediction of CSC. After the optimal cutoff value for maximum accuracy was extracted, the corresponding sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) also were estimated. The optimal cutoff value derived from the training dataset was verified using the validation dataset. To evaluate feature robustness to segmentation variabilities, inter- and intra-observer agreements were tested by calculating intraclass correlation coefficient (ICC) in validation dataset. The values were classified as poor (0.00–0.20), fair (0.21–0.40), moderate (0.41–0.60), good (0.61–0.80), and excellent (0.81–1.00).

Results

Training dataset

Among 65 patients, 46 patients were confirmed by radical prostatectomy and 19 patients by systematic biopsy. The study population consisted of patients having GS 6 (n = 16), GS 7 (n = 18), GS 8 (n = 20), and GS 9 (n = 11). Table 1 summarizes the demographics of the study population. The detailed results on the associations between the texture parameters and GS are provided in Tables 2 and 3.

Table 1 Demographics of study population
Table 2 Association between individual texture parameters on ADC map and GS
Table 3 Association between individual texture parameters on T2WI and GS

After multiple regression analyses on those significant texture parameters on the combined T2WI and ADC map, GLCM entropy on ADC map (R2 = 0.4227, R2-adjusted, 0.4043, rpartial, 0.3583, P = 0.0034) was the only significant indicator for GS. Representative images are shown in Figs. 2 and 3. The AUC of GLCM entropy for discriminating CSC from non-CSC was 0.825 (95% CI 0.711–0.907) with a maximum accuracy of 82%, a sensitivity of 86%, a specificity of 71%, a PPV of 89%, and an NPV of 63% (Fig. 4).

Fig. 2
figure 2

67-year-old man having histopathologically confirmed prostate adenocarcinoma with Gleason score (GS) 3 + 3 = 6. a Axial T2-weighted image (T2WI) shows ovoid low signal intensity (SI) lesion (arrow) in left middle peripheral zone. b Axial diffusion-weighted image (DWI, b = 1000 s/mm2) shows high-SI lesion (arrow) in corresponding location. c Axial apparent diffusion coefficient (ADC) map generated from (b) shows homogenous low-SI lesion (arrow) in same area. d Dedicated texture analysis software with 3-dimensional analysis automatically calculated texture features of region of interest (ROI, green color) manually drawn along tumor border on axial T2WI or ADC map. The gray-level co-occurrence matrix (GLCM) entropy of the whole tumor was 2.920 on ADC map

Fig. 3
figure 3

80-year-old man having histopathologically confirmed prostate adenocarcinoma with GS 4 + 4 = 8. a Axial T2-weighted image (T2WI) shows large low-signal intensity (SI) mass (arrow) replacing whole prostate gland and extending to rectum through mesorectal fat tissue. b Axial diffusion-weighted image (DWI, b = 1000 s/mm2) shows heterogeneously high-SI mass (arrow) in corresponding area. c Axial apparent diffusion coefficient (ADC) map generated from (b) shows reciprocal heterogeneous low SI mass (arrow) in same area. d Dedicated texture analysis software with 3-dimensional analysis automatically calculated texture features of region of interest (ROI, green color) manually drawn along tumor border on axial T2WI or ADC map. The gray-level co-occurrence matrix (GLCM) entropy of the whole tumor was 4.992 on ADC map

Fig. 4
figure 4

Receiver operating characteristic (ROC) curve of gray-level co-occurrence matrix (GLCM) entropy on apparent diffusion coefficient (ADC) map for prediction of clinically significant cancer (CSC). The area under the ROC curve (AUC) of the GLCM entropy on ADC map was 0.825 (95% CI 0.711–0.907). For an optimal cut-off value of 2.923, the estimated maximum accuracy was 82%, with a sensitivity of 86% and a specificity of 71%

Validation dataset

The study population comprised GS 6 (n = 10), GS 7(n = 28), GS 8 (n = 8), and GS 9 (n = 5). The validation dataset consisted of a CSC group (n = 41) and a non-CSC group (n = 10). The AUC of GLCM entropy was 0.920 (95% CI 0.808–0.977). When the cutoff value of 2.92 was applied to the validation dataset, GLCM entropy showed a sensitivity of 98%, a specificity of 70%, and an accuracy of 92%.

As for inter-and intra-observer agreement, all features except skewness and kurtosis on T2WI showed good or excellent agreements on both ADC map and T2WI. The detailed results on ICCs are provided in Tables 4 and 5.

Table 4 Intraclass Correlation Coefficient on ADC map
Table 5 Intraclass Correlation Coefficient on T2WI

Discussion

Although dynamic contrast enhanced imaging is one of main sequences in multi-parametric MRI, its role in PI-RADSv2 is minor [26, 27]. Also, several studies have revealed that gadolinium-based contrast media can accumulate in deep cerebral structures [28, 29]. Bi-parametric MRI, meanwhile, has shown a comparable diagnostic performance to that of multi-parametric MRI for detection of CSC [7, 8]. In this context, we used bi-parametric MRI for the texture analyses, based on which, GLCM entropy on ADC map were associated with GS in patients with PCa. GLCM, defined as “a two-dimensional histogram of gray levels for a pair of pixels that are separated by a fixed spatial relationship,” is the most common second-order statistical feature of texture analysis [10, 30]. Given that GLCM entropy, one of the various GLCM parameters, reflects the disorder or complexity of an image and that complex textures tend to have high entropy [30], it is obvious that the more heterogeneous a tumor which reflects higher GS in PCa is, the higher GLCM entropy is.

Our results revealed that GLCM entropy showed an accuracy of 82% for discriminating CSC from non-CSC in the training dataset, and 92% in the validation dataset. In our opinion, the increased accuracy in the validation dataset might be attributed to the segmentation process in which the topographic map from the radical prostatectomy specimen served as the reference for all study population. Two studies similar to the present one have been conducted to evaluate the aggressiveness of PCa using texture analyses on bi-parametric MRI [13, 17]. Wibmer et al. determined that GS was associated with higher entropy (GS 6, 7.4 ± 1.04; GS > 7, 8.23 ± 0.8; P = 0.0069) and lower energy (GS 6, 0.009 ± 0.005; GS > 7, 0.005 ± 0.003; P = 0.0039) on ADC map; however, none of the other texture features showed a significant association with GS on T2WI [13]. Niu et al. observed that GLCM entropy and inertia on ADC map were positively associated with GS (r = 0.614 and r = 0.663, respectively), and found that the AUC for texture analysis based on logistic regression models was 0.89 (95% CI 0.82–0.94) with a sensitivity of 87% and a specificity of 89% for detection of high-grade PCa (GS ≥ 7) [17]. The discrepancy between this study and ours is the statistical approach. Niu et al. ran logistic regression models that combined texture parameters and PI-RADSv2 scores, whereas we ran only a multiple regression model for texture parameters, due to the fact that the PI-RADSv2 score might differ between observers.

Rozenberg et al., by contrast, performed a mono-parametric texture analysis on ADC map among multi-parametric sequences, and reported that texture features (skewness, kurtosis, entropy, run-length non-uniformity) on ADC map were not significantly different between GS 3 + 4 = 7 and GS 4 + 3 = 7 tumors. However, their logistic regression models generated by incorporating all four texture features yielded an AUC of 0.77 with a sensitivity of 71% and a specificity of 78% [15]. This observation might reflect a lack of ability of a single texture feature to distinguish significant differences in a heterogeneous tumor environment between tumors with the same GS.

There are several limitations to our study. First, the pathologic reference standards for GS were derived not only from radical prostatectomy but also from systematic biopsy. Because patients with higher GS may be treated with hormone therapy rather than radical prostatectomy [31], patients with higher GS were included after systematic biopsy without radical prostatectomy. However, the result from biopsy is not always consistent with that from radical prostatectomy and may change after radical prostatectomy [15, 18]. Second, there was a lack of application of specific determining sequences for the prostate zones. According to the PI-RADSv2, DWI is the primary determining sequence for the peripheral zone while T2WI is for the transitional zone [27]. However, there were various zonal distributions including entire zone in this study. Moreover, T2WI alone was not sufficient for tumor identification, even after consensus review, and particularly for GS 6 tumors. Therefore, it was necessary to combine DWI and ADC map with T2WI for tumor localization and segmentation, which is another reason why the texture analyses were based on bi-parametric MRI. Third, We could not incorporate PSA data to evaluate the improvement of diagnostic performance because all PSA data were not optimal due to a long time interval over 1 month between PSA test and surgery or systemic Bx. Last, test- retest repeatability between different scans could not be evaluated because of the retrospective study design. A recent study observed that many radiomics features and preprocessing combinations showed high repeatability (ICC > 0.85) and overall the repeatability was highly sensitive to the processing parameters. Therefore, it recommended paying close attention to the processing configuration when interpreting radiomics features [32].

In conclusion, GLCM entropy on ADC map was associated with GS in patients with PCa, in addition, its estimated accuracy for discriminating CSC from non-CSC was 82%.