1 Introduction

Globally, prostate cancer is the second most common cancer among men. It is the sixth reason for cancer deaths in males and the fourth most common cancer in the world. 1,414,259 new cases and 375,304 estimated deaths were anticipated in 2020 [1]. Population growth and increasing life spans are expected to add 2.3 million cases by 2040.When the body requires new cells, existing cells expand and divide to produce them. When cells become damaged or old, they expire. New cells then replace the old ones. Cancer develops when the body's cells start to proliferate out of control. Any cell in the human body might develop cancer and spread to other body regions [2]. Tumors emerge from fast-growing cells. Malignant and benign tumors exist. A malignant tumor has cancer because it may develop and spread to other bodily areas. A benign tumor has the potential to grow but not spread. Only men have the prostate gland in their bodies. The fluid, a component of semen, is produced in part by it. Men's urethra and bladder are connected by a gland called the prostate. In addition to maintaining sperm healthy for proper fertilization, it also creates a fluid that cleans semen. The center of the prostate is where the urethra, a tube that exits the body via the penis to carry urine and semen, passes. When a man is young, the prostate is about the size of a walnut; as he gets older, the prostate might enlarge. Prostate cancer first appears as growths around the borders of the prostate. Then, it gradually spreads through the prostate's lining and across its margins. Prostate cancer is found in around 1 in 7 males worldwide at some time in their life. If prostate cancer is not identified and treated at an early prognostic stage, it typically has a deadly outcome [3]. The cancer is no longer curable if it spreads outside of the prostate, such as to the bones, lungs, or lymph nodes, but medical research has made it possible for us to manage it and prolong the patient's life by many years. It has been noted that people having advanced prostate cancer have sometimes survived for many years before passing away from an unrelated reason. Most men with prostate cancer are elderly males [4]. Fewer than 1% of prostate cancer diagnoses in males under 50, and more than 80% of men with the disease are over 65. If prostate cancer runs in one's family, one's risk of developing it rises. The precise cause of cancer has not yet been determined, although some diets, including those high in red meat and lipids, have been associated with prostate cancer. In addition, the substance created from cooking beef at a high temperature is harmful to the prostate. Different nations have different prostate cancer rates depending on what they eat [5]. Compared to nations where the main staples of the diet are vegetables and rice, it is more prevalent in nations with heavy meat and dairy intake. One more element is hormones. Higher fat intake causes testosterone to rise, which promotes the formation of prostate cancer. Prostate cancer is also more common when people do not exercise. A few workplace risks have also been identified as prostate cancer risk factors [6]. Computerized rectal examination, the PSA (The prostate-specific antigen) test, and transrectal ultrasound (TRUS) are the three main assessment methods now used to identify prostate cancer. The most well-known way of screening and detecting cancer among all the diagnostic techniques is still PSA. The PSA test may identify malignancy by assessing the amount of PSA present in the patient's blood cells. However, considerable obstacles still exist that affect how reliable PSA testing is. One of these obstacles is the high incidence of false positives brought on by inflammations or prostatic hyperplasia. Such circumstances often lead to an elevated PSA level, which may provide an inaccurate prognostic report. Transrectal ultrasound-guided biopsy is the most precise method of diagnosing prostate cancer. However, since the treatment is often expensive and ineffective, a biopsy is only recommended as a last option. According to research, since TRUS-guided biopsy is an unguided procedure, around 30 percent of the total benign lesions (Stage I and Stage II) go undetected during the first stage of prognostication. By performing a comprehensive and accurate diagnosis, we can lower the death rate of prostate cancer. Studies demonstrate that we can considerably lower prostate cancer mortality rate if we can identify people at an earlier stage of their illnesses [7]. Determine the degree of aggressiveness of the cancer cells once a biopsy confirms the presence of malignancy (grade). First, your tumor's pathologist examines a sample to see how distinct cancerous cells are from normal ones. The Gleason score is the most used metric for assessing the grades of prostate cancer cells. Gleason grading is a two-number scale ranging from 2 to 10 (non-aggressive cancer to aggressive cancer); however, the lower end of the spectrum is not used as often [8]. For a dynamic healthcare environment, the profession of medicine has depended on specialists who learn via experience and self-learning. Due to sophisticated technologies that provide quantitative and qualitative measurements of many characteristics, the rise in knowledge and understanding of illnesses is strongly correlated with increased data and information. Machine Learning (ML) and Deep learning (DL) technologies can be used in a massive data sector like this. The use of machine and Deep Learning to gather data from many sources and support highly competent individuals’ decision-making is expanding rapidly in the medical sector. There is enormous potential for machine learning to change the medical landscape in diagnosis and prognosis, medication discovery, and epidemiology [9]. Due to their inability to extract hidden features, previously used strategies did not succeed in delivering more incredible performance. Some of the previously suggested strategies were unable to use acceptable methods to detect cancer. Deep learning and machine learning offer strong prediction capabilities for accurately classifying and identifying illnesses. Different feature extraction methods put forward by different academics provide decent results, but progress is still needed since ML algorithms cannot be fine-tuned to yield better outcomes. Applying deep learning techniques may provide a solution to this issue. Because training the CNN network takes too much time and processing resources, transfer learning is employed in this research. High-level characteristics may be extracted from the images using deep learning. Deep learning techniques may make it feasible to obtain high detection rate with no need for manually created features since features may be retrieved during training. Deep learning approaches have also been quite popular in recent years for Gleason grading and prostate cancer diagnosis because to massively parallel computing (GPUs). Our study aimed to produce several significant aspects, including the following:

  • We were able to detect prostate cancer using a variety of transfer learning models based on five convolutional neural networks (CNNs), including MobileNet V2, InceptionResNet V2, DenseNet 169, ResNet101 V2, and NASNetMobile. This was a noteworthy accomplishment.

  • A total of five transfer learning methods were evaluated based on a variety of performance measures, comprising accuracy, loss function, as well as RMSE values. The models were tested in order to predict prostate cancer.

  • Using the InceptionResNet V2 model, the classification results may be improved. Experiments showed that InceptionResNet V2 learning model delivered remarkable outcomes (average results).

The remaining portions of the paper are as follows: In Sect. 2, a summary of the study's literature review is presented, which includes an in-depth analysis of data that has been previously published. In Sect. 3, the suggested framework, as well as data analysis and transfer learning models, are described. In the last part of this multi-part article, Sect. 4 digs into the mechanics of various deep learning model implementations and the results of various performance measure evaluation approach. Finally, Sects. 5 and 6, which summarise vital findings and suggest possible changes, are the last two sections.

2 Literature Review

With the minimal data offered by the prostateX challenge, Chen et al. [10] showed that we could swiftly retrain state-of-the-art DNN models using transfer learning. Only 78 of the 330 lesions in the training data were clinically significant. The training process included efforts to balance the data. With our pre-trained ImageNet inceptionV3 and Vgg-16 models, we were able to rank fourth overall on the prostateX test data with AUC values of 0.81 and 0.83, respectively. We found that the sensitivity of models developed for various prostate zones varied. A brand-new MPTL (multiparametric magnetic resonance transfer learning) technique was created by Yuan et al. [11] to stage prostate cancer automatically. Researchers first created a deep convolutional neural network with three branch topologies to calculate the features from multiparametric MRI images (mp-MRI). Then, the information of the mp-MRI sequences is represented by concatenating the learned features—model classified prostate cancer with excellent accuracy (accuracy of 86.92 percent). Our technique classifies prostate cancer more accurately than deep learning algorithms and hand-crafted feature-based methods. For example, Serbanescu et al. [12] used two pre-trained deep learning algorithms to classify 6000 Gleason2019 Challenge photographs. Transfer learning was used to build these networks. The two networks' average agreement with the six pathologists was determined to be modest for GoogleNet and high for AlexNet. The agreement for AlexNet and GoogleNet was perfect and modest, respectively, when measured against the majority decision of the six pathologists. Contrary to what we expected, the average level of inter-pathologist agreement was only modest, but it was relatively high between the two networks. Compared to the majority vote as the ground truth, the accuracy for AlexNet and GoogleNet was 85.51 percent and 74.75 percent, respectively. BS et al. [13] constructed a patch-based Deep convolutional neural model to distinguish PCa (prostate cancer) from NCs using mp-MRI data. Improved prediction algorithms boosted with a forecast accuracy. DCNN showed excellent diagnostic accuracy in discriminating PCa from NC with an AUC of 0.944, 87.0% sensitivity, 90.6% specificity, 87.0% PPV, and 90.6% NPV for testing dataAbbassi et al. [14] used CNNs & transfer learning to train GoogleNet and ML classifiers, the cancer MRI database is employed. Various features are retrieved, including morphological, entropy-based, texture, SIFT, and elliptic Fourier Descriptors. The CNN model (GoogleNet), employing the transfer learning strategy, produced the best results.

Kim et al. [15] looked at how well the network performed when there was enough data in the target database to train it using direct transfer learning. The categorization accuracy rates for X-ray, MRI, and CT were 92.2 percent, 73.6 percent, and 93.3 percent, respectively. These findings showed that the suggested modality-bridge transfer learning method could still provide good classification performances even when only employing a tiny portion of the target medical information. Motamed et al. [16] suggested altering the commonly utilized Dice score coefficient (DSC) to enhance performance in small cohorts. One hundred fifteen patient cases from the target domain's fine-tuning data, with a mean DSC of 0.85 and 0.84, respectively, are obtained for segmenting WG and TZ in the target task compared to without transfer learning, DSC of 0.71. Sharifi-Noghabi et al. [17] used the first DAE's (Denoising Auto-Encoders) parameters. This trains a huge DAE on a small dataset. Filtering transferred and learnt effects after DAE training yields high-weight genes. DGS outperforms gene signatures in five validation datasets. For precise PCa grading in histopathology images, Karimi et al. suggested a DL-based categorization algorithm and data augmentation techniques. This study used three independent convolutional neural networks operating with various patch sizes to merge their predictions. The suggested approach successfully distinguishes between malignant and benign patches with a classification accuracy of 92 percent. This study analyses Abraham et al. [19] developed approach for identifying prostate cancer using CNN as well as the LADTree classifier. The technique produced weighted AUC of 0.74, as well as an unweighted kappa value of 0.3993. An automated Gleason grading of prostate cancer using H&E complete slide images was suggested by Xu et al. [20]. First, the technique selects cancer tiles with a high nuclei concentration from the whole slide image (WSI) after it has been divided into little tiles. Then, Gleason patterns are characterized by CSLBP texture characteristics from well-known picture tiles. Finally, the multi-class SVM (support vector machine) uses the CSLBP texture features from all tiles to grant Gleason scores like 6, 7, or 8. Reda et al. [21] established a CAD system to diagnose and localize prostate cancer using DWI at five b-values. The proposed system's first step uses NMF-nonnegative matrix factorization to aggregate prostate voxel intensity, spatial characteristics, and shape prior data. Second, ADC maps with segmented prostatic sections are created to differentiate cancer from benign instances. Finally, ADC maps are used in CAD's last phase to train a convolutional neural network to identify malignancies. 50% of the ADC maps are utilised for training the CNN model and 50% to test its accuracy. The suggested CAD system has 0.93 AUC at five b-values. Tolkach et al. [22] used a huge training dataset and a cutting-edge, fully convolutional architecture to construct DL-based models for recognizing malignant prostatic tissue in whole-slide images (NASNetLarge). Our method for tumor identification achieves 97.3% in a native version and more than 98% utilizing DL-based techniques in two validation cohorts. In a subsequent phase, the researchers proposed a novel DL-based method with good performance levels in patient prognostic categorization for Gleason grading prostatic adenocarcinomas (Table 1).

Table 1 Comparative analysis

3 Methodology

Figure 1 explains the research methodology that has been presented for the prediction of prostate cancer. In addition, the proposed architecture aims to improve Prostate cancer prediction outcomes using a variety of transfer learning approaches. The proposed method requires pre-processing, EDA (exploratory data analysis), and feature extraction techniques before the dataset are employed in the recommended manner. The data has also been expanded and separated for validation. The following transfer learning algorithms have been used to categorize prostate cancer: MobileNet V2, InceptionResNet V2, DenseNet 169, ResNet101 V2, and NASNetMobile. The pre-trained methods used in the study are thoroughly detailed in Sect. 4. Additionally, the models are tested and evaluated according to how well they can forecast the future using a variety of assessment metrics (Ref. section results).

Fig. 1
figure 1

Proposed framework

3.1 Dataset Description

The dataset used for this study is accessible at https://panda.grand-challenge.org/home/. PANDA is a dataset that is freely accessible that was developed as part of a Kaggle challenge by Tampere University, Karolinska Institute, and Radboud University Medical Centre and sample images of dataset is shown in Fig. 2. 25% of the data are utilized for model testing, while 75% are used for model training. The training dataset uses a 0–5 scale to indicate the cancer severity. Many approaches are utilized on the images of the training and test data. They occasionally use stray pen markings on training dataset images while avoiding them on test dataset images. Image segmentation masks show the image's ISUP (International Society of Urological Pathology) grade. The label masks on specific training images could include false positives or negatives. These masks are necessary for choosing the significant sub-samples from the images. The data provider typically determines these mask values as shown in Fig. 3.

Fig. 2
figure 2

Sample of dataset images

Fig. 3
figure 3

Description of dataset

3.2 Gleason Grading Method

If the tissue cells/glands are homogeneous and tiny, they are said to be well-differentiated and fall under the classification of Gleason 1 in the traditional Gleason scoring method. Compared to Gleason 1 development patterns, Gleason 2 growth patterns include more stroma and gaps between the glands. Cells from margin glands have a unique interpretation, according to Gleason 3. Gleason 4 has few glands and aberrant cell masses. Gleason 5 resembles a sheet of cells and lacks or has uneven glands. Based on the architectural development patterns of the tumor, cancer tissue is located and categorized into Gleason patterns throughout the Gleason grading procedure. An ISUP (International Society of Urological Pathology) grade on a scale of 1 to 5 is created by converting a biopsy result into the equivalent Gleason score [24]. The Gleason grading system is distinctive because it only considers tumor architectural aspects. The method emphasizes cytological traits more than the worst pattern. The Gleason score (GS) takes into account two typical patterns. One of the excellent prognostic indicators for finding prostate cancer is this method. Patients all across the globe can identify cancer in large part to the ISUP grading system. For pathologists to make the best choice, Gleason’s grading accuracy is crucial. Inhomogeneous Gleason grading, changes have been made since 2014. The heterogeneous grading method utilises the majority and minority development patterns from the biopsies. The Gleason scores are added together to give a new Gleason grade. 3 + 3 correlates to ISUP grade 1, 4 equates to ISUP grade 2, 4 + 3 to grade 3, and 4 + 4 to fourth grade, and a score of 5 + 3 to fourth grade. Finally, the ISUP grade 5 equations are 4 + 5, 5 + 4, and 5 + 5 [25].

3.3 Pre-processing

Once the dataset has been collected, pre-processing is required. High-resolution histopathological image analysis takes a lot of time. The intricacy of the backdrop and any unpleasant elements might simultaneously slow down processing. Algorithms for image preparation aid in averting this undesirable circumstance. It is essential for attaining the desired qualities. Therefore, the best circumstances for the image-processing approach to execute the initial task must be specified. Image processing systems are affected by light variations and noise [26]. The performance will improve if these unfavourable elements are removed. In histopathological image processing, image quality, noise removal, sharpening, tiling, standardisation, resolution reductions, stain normalization, ROI identification, and feature extraction are frequently utilised methods. Methods of pre-processing affect the image's brightness, contrast, and noise. This makes it easy for classification methods to work, even when the brightness and contrast fluctuate provider, isup grade and gleason score. contains a few NaN-denoted missing values. The more efficient approach is to imputation or estimate missing values using the existing data. We imputed missing data using Scikit-Simple learn's Imputer class. To input the whole dataset into the model, we must remove missing values as they cannot be directly provided to learning models.

3.4 Exploratory Data Analysis

The graphic representation of data is known as data visualization and is used to understand its connections and trends. Exploratory Data Analysis (EDA), which improves understanding of the data, makes the creation of machine learning models substantially more feasible. In this study, data visualization strategies have been examined using Plots, a Matplotlib-based data visualization tool. The EDA makes it easier to find hidden traits while collecting data. In addition, it has an advanced user interface that allows you to create captivating and instructive statistical graphics [27]. For the EDA, many libraries were loaded, including NumPy, Matplotlib, Seaborn, and Pandas. Figure 4 displays the traits & distributions of the data-set from the two separate sources. As shown in Fig. 5, the ISUP grade of 0 or 1 may be found in more than 50% of the train's data samples. For the remaining data samples, the ISUP grades range from 2 to 5, for each grade contributing 11 to 12%.

Fig. 4
figure 4

Traits and distributions of the data-set

Fig. 5
figure 5

Grade distribution of ISUP

Figure 6 demonstrates that (i) It is depicted from the graph that the Gleason score distribution is not uniform. (ii) Only a few Gleason scores, such as 3 + 3 and 0 + 0, are more prevalent when the ISUP grade is 1. These scores are 3 + 3 and 0 (iii) All tests with an ISUP grade of two have a Gleason score of 3 + 4 (iv) Exams with an ISUP grade of three have a Gleason score of 4 + 3 (v) Gleason scores of 4 + 4 (majority), 3 + 5, or 5 + 3 are present on all tests with an ISUP grade of 4. Figure 7 and 8 depicts the relative count plot of the Gleason score & ISUP grade in relation to the Data Provider. Figures 9, 10 and Tables 2, 3 shows the funnel distribution of ISUP score and gleason score.

Fig. 6
figure 6

Percentage plot of Gleason score

Fig. 7
figure 7

Distribution of Gleason Score

Fig. 8
figure 8

Distribution of Gleason Score

Fig. 9
figure 9

Funnel chart distribution (ISUP_grade)

Fig. 10
figure 10

Funnel chart distribution of Gleason_score

Table 2 ISUP_grade with image_id
Table 3 Gleason_score with image_id

3.5 Feature Extraction

The images utilized in this suggested approach have varying dimensions. The original images are rescaled to 224 × 224 to obtain uniform picture dimensions [28]. In addition, these pictures were rescaled to 224 × 224 pixels to match the input image size of MobileNet V2, InceptionResNet V2, DenseNet 169, ResNet101 V2, and NASNetMobile pre-trained models. Depending on the original image's dimensions, each level's size varies. Biopsies may be cycled in a variety of ways. This rotation is based on how the biopsy was acquired in the laboratory and had no clinical relevance. There are notable color differences amongst the samples; this is a regular occurrence in pathology and is caused by different laboratory techniques. Tables 4 and 5 depicts each ISUP Grade Image based on the data source (Radboud image and Karolinska Image). With matplotlib and a customized color map, we can quickly identify the various cancer spots. Figure 11 presents a few instances of image masks from the images.

Table 4 Prostate glands individually labelled (Radboud)
Table 5 Regions labelled in prostate glands (Karolinska)
Fig. 11
figure 11

Sample of image_masks

Gleason grading is an effective way to evaluate prostate cancer aggressiveness. Gleason ratings, which are associated with the severity of symptoms, are used to define the development patterns of prostate cancer. Based on the patterns of gland differentiation, this technique classifies prostatic tumors into five groups. The scale ranges from 1 (good prognosis) to 5 (bad prognosis) (poor prognosis). Deep learning technology may make a substantial contribution to the automated diagnosis of prostate cancer in tissues and the prognostication of cancer stage severity. Segmentation masks displaying the areas of the image that resulted to the ISUP score [29]. Label masks are not always present in training images, and they might include false-positives or false-negatives for a number of reasons. These masks are provided to aid in the development of methods for selecting the most noteworthy image subsamples. The mask values are determined by the data provider. Figures 12, 13 and 14 illustrate the images and their respective segmentation masks for various ISUP grade scores. We may overlay the masks upon the tissue to clearly identify the spots that are malignant since they are the same size as the slides. This overlay might use to find different development patterns. In order to do this, we incorporate the mask and biopsy into PIL. Figure 15 depicts the combination of the images with their Masks.

Fig. 12
figure 12

ISUP grade score 0 and 1 for Karolinska image and Radboud image

Fig. 13
figure 13

ISUP grade score 2 and 3 for Karolinska image and Radboud image

Fig. 14
figure 14

ISUP grade score 4 and 5 for Karolinska image and Radboud image

Fig. 15
figure 15

Images combined with their masks

3.6 Train Test Split

One portion of the data is used to develop a prediction model, while the other is used to evaluate the performance of the model [30]. Figure 16 depicts the criterion of the division employed in our model. Our dataset was divided into a training data (\({T}_{r}\)) and a testing data (\({T}_{s}\)), which were utilized to train and evaluate the model, respectively. Our transfer learning models were trained upon the \({T}_{r}\) set to locate and understand connections, and then evaluated on the \({T}_{s}\) set to determine their predictive accuracy.

Fig. 16
figure 16

Dataset segregation process

4 Deep Learning Models

Deep learning, especially CNNs, is widely used in various fields, including bioinformatics and computational medicine. The CNN neural network type is helpful for image processing tasks, including detection and classification. Convolutional layer stacking and layer pooling are features of CNN designs. The image becomes smaller as the network becomes more profound, but more feature maps are available [33]. In this section, the functions of the various components of the CNN architecture have been succinctly explained.

  • Convolutional Layer: The fundamental component of CNN is the convolution layer. There are several kernels in the convolution layer. Each neuron functions as a kernel. The kernel uses a specific set of weights to deal with images by multiplying the components of the images by the corresponding elements of the receptive area. Additionally, several types of convolutions are based on the strides, filters, and padding. When the image is too huge, the pooling layer attempts to lower the number of parameters and reduce the danger of overfitting.

  • Pooling Layer: CNN employs a variety of pooling formulations, including maximum, total, average, L2-norm, overlapping, spatial pyramid, and others

  • Activation Mechanism: The node at the ends of a neural net that determines how complicated patterns are learned is called an activation function, sometimes referred to as a layer. Selecting a strong activation function aids in accelerating the learning process. However, the most prevalent activation function now in use, ReLU, is included into almost all CNN systems.

  • Batch Normalization: When building neural network models, batch normalization is a standard procedure that somewhat helps the model's convergence. It most dramatically solves the gradient dispersion issue in the network or the phenomena of unstable gradient variation. Normalization is essential because it guarantees that the data the model receives after training at every layer is uniform.

  • Fully Connected Layer: Neural networks that are fed forward are the Fully Connected Layer. The last several levels of the network are known as Fully Connected Layers. This is because the output from the last pooling/convolutional layer is passed into a fully connected layer after being flattened. This flattened vector (array) is coupled with a few fully connected layers analogous to ANN (artificial neural networks) and carries out analogous mathematical operations [31].

  • Transfer Learning: Transfer learning is an effective approach when a neural network handles limited input for a new domain and a significant portion of previously acquired data may be moved to the new task. Transfer learning is often used in histopathology images. It learns the characteristics of the new WSI (whole slide image) by analysing the characteristics of the old WSI, and its primary function is to identify the similarities between the existing and new WSIs. In particular, we refer to the current empirical knowledge of the model as the source domain and the new features to be learned by the model as the target domain in transfer learning [32].

4.1 MobileNet V2

MobileNet V2 design requires the convolutional layer, which is prohibitively expensive in ordinary convolutions. For efficiency, the MobileNet V2 architecture incorporates depth-wise separable convolution [34]. By using three depth wise separable convolutions, as seen in Figs. 17, MobileNet aimed to lower the network’s computational costs. In contrast to typical convolutions, MobileNet V2’s convolutional layer is relatively inexpensive. Bottleneck layers connect all levels of MobileNet V2 by inverting their residual structure. Table 6 lists the layer and parameter specifications for the MobileNet V2 architecture. Smaller models perform better in a deep-learning architecture. The essential concept is to replace a full convolution operator with a factorized form that separates convolution into two levels. Each input channel receives a single convolutional filter, used to conduct light filtering in the first layer. An additional feature termed pointwise convolution is utilized to build more features into the second layer.

Fig. 17
figure 17

Architecture of MobileNet V2

Table 6 Specifications of the MobileNet V2 architecture’s layers and parameters

4.2 InceptionResNet V2

Convolutional neural architecture based on InceptionResNet V2 adds residual connections into its design. Close connection and the Inception architecture, two current deep-learning models, are combined to create the InceptionResNet V2 architecture [35]. This hybrid deep-learning model combines the benefits of a residual network with the unique qualities of the Inception network's multi-convolutional core. As input, the layer receives a 299-by-299 image and generates a list of predicted possibilities as output. Utilizing residual connections helps to save training time while also avoiding the degradation issue brought on by deep structures. The graphic depicts InceptionResnet-fundamental V2’s network architecture as shown in Fig. 18. Table 7 lists the details of various layers and parameters of the model.

Fig. 18
figure 18

Architecture of Inception-ResNet-V2

Table 7 Specifications of InceptionResNet V2 architecture’s layers and parameters

4.3 DenseNet 169

DenseNet architecture has been designed to maximize information flow throughout the network's layers while minimizing the construction of short paths between its early and later levels. DenseNet connects all of its levels as a consequence. Each layer also gets additional data from the levels below it and delivers unique maps to the layers above it [36]. Each layer of the neural network known as the Dense CNNs (DenseNet) has direct access to the gradients of the loss function. There are three flavours of DenseNet: DenseNet-201, DenseNet-169, and DenseNet-121. Figure 19 illustrates how DenseNet increases network capacity by recycling features from prior knowledge. Although the residual network's basic notion is connection skipping, highly convolutional neural networks are a subset. Specific inputs may move freely across levels, preventing data loss between layers and gradient disappearance. With minimal processing and memory needed, it achieves a high level of accuracy. Additionally, it leverages previously learned characteristics in subsequent layers, using fewer parameters. The specifics of the model's many layers and parameters are shown in Table 8. The problem of vanishing gradients is, therefore, better addressed by this approach.

Fig. 19
figure 19

Architecture of DenseNet169

Table 8 Specifications of DenseNet169 architecture’s layers and parameters

4.4 ResNet-101 V2

ResNet was the originator of batch normalisation. ResNet’s innovative usage of skip connections enabled the model to establish an identity function. The Identity function assures that the functionality of the higher layer is comparable to, if not superior to, that of lower layer. Even with deeper Convolutional networks (up to 152 layers), ResNet's generalisation capabilities can be maintained [37]. The specifics of the model’s many layers and parameters are shown in Table 9.

Table 9 Specifications of ResNet101 V2 architecture’s layers and parameters

The same number of layers & filters are used by ResNet-101 to produce the very same output features. Remaining connections are used by ResNet-101 after the application of integration chain rule. Each unit of residual networks, that are constructed from a number of carefully selected units, may be characterised as follows:

$$y_{{\text{i}}} \, = \,h\left( {a_{{\text{i}}} } \right) + F \left( {a_{{\text{i}}} ,W_{{\text{i}}} } \right)$$
(1)
$${a}_{i}+1=F({y}_{i})$$
(2)

\({a}_{i}\) is inputted here. \(\left({a}_{i+1}\right)\) introduced because \(\left({a}_{i+1}\right)\) relies on yi according to the residual function, \(F ({a}_{i},{W}_{i})\) . The formula for the residual map is \(F={W}_{i }\delta ({W}_{ia})\) here; \(\delta\) designate Relu. The linear projection \({W}_{m}\) is carried out to match dimensions if ai and yi's dimensions are not equal or if the input & output channels are changed.

$$F\, = \,F \left( {a_{{\text{i}}} ,W_{{\text{i}}} } \right) + W_{{{\text{m}}^{{\text{a}}} }}$$
(3)

Matching dimensions are the only circumstances in which Wm may be utilised. Multiple convolution layers are represented in this by the residual function \(F \left({a}_{i},{W}_{i}\right)\). It is possible to write ResNet-101 iteratively as listed in Eq. (4).

$${x}_{I}={x}_{i}+\sum_{n=1}^{i}-1 F \left({a}_{i},{W}_{i}\right)$$
(4)

4.5 NASNetMobile

The goal of the NasNet study was to use reinforcement learning to find the best CNN architecture [22]. As a result, the Google Brain Team created the Neural Architecture Search Network (NASNet), which employs two primary functionalities: (1) Normal cell (2) The reduction cell in Image; NasNetMobile refers to the scaled-down version. The original NasNet Architecture, seen in Fig. 20, uses normal and reduction cells precisely because the number of cells is not pre-determined. The specifics of the model’s many layers and parameters are shown in Table 10. The regular cells determine the dimensionality of the feature map, as well as the reduction cells, give back the feature map reduced by a factor of two in terms of height and width. NasNet’s control architecture, built on recurrent neural networks (RNN), predicts the network’s whole structure from the two initial hidden states.

Fig. 20
figure 20

Architecture of NasNetMobile

Table 10 Specifications of NasNet Mobile architecture’s layers and parameters

4.6 Parameters for Evaluation

Different performance indicators were used to assess the classification performance of deep learning pre-trained networks. This study's basic case levels for statistical analysis include accuracy, RMSE score, and loss function [40,41,42,43,44,45,46]. We are not concentrating on other measures, such as Precision Recall F1 Score, since this is a Binary Classification, and we cannot locate such values. In this section, the metrics used to evaluate the model's performance are described in this section.

  • Accuracy: A model’s accuracy may be determined by finding the connection and patterns between different characteristics in a given training dataset [38]. It is determined using Eq. (5).

    $$Accuracy= \frac{TP+TN}{TP+TN+FP+FN}$$
    (5)
  • Loss: This parameter measures how poorly the model forecasts the data. It is determined using Eq. (6).

    $$Loss= \frac{{(AV-PV)}^{2}}{N}$$
    (6)
  • RMSE: The term “RMSE” refers to the standard-deviation of faults that occur while making predictions in a dataset using Eq. (7).

    $${\text{RMSE}}\, = \,\sqrt {\mathop \sum \limits_{i = 1}^{n} \frac{{\left( {\hat{y}_{i} - y_{i} } \right)^{2} }}{n}}$$
    (7)

To calculate the RMSE, divide the total of observations by the number of expected values, and divide the number of observed values by the number of anticipated values [39].

5 Results

The ISUP grades 0 to 5 of digitized prostate cancer microscope images were utilized in this investigation. We have shown that solving this medical classification issue is possible using a DL technique. In tertiary medical institutions without directly connecting to a pathologist’s specialist or that rely on telemedicine for faster decision-making, this computerised method may result in the successful adoption of medical-grade technology designed to either supplement the judgement of a medical expert or provide an intermediate diagnosis. We do not feel it is prudent, however, to exclude a human expert from the process, given that algorithms themselves have inherent limitations. The accuracy (training and validation) of MobileNet V2, InceptionResNet V4, DenseNet 169, ResNet101 V2, and NASNetMobile on different ISUP grades (0–5) is shown in Table 11. DenseNet 169 excelled in the majority of grades, namely validation accuracy of 89.76% (ISUP Grade 0), training accuracy of 95.63% (ISUP Grade 1), validation accuracy of 96.98% (ISUP Grade 2), validation accuracy of 91.98% (ISUP Grade 3), & training accuracy of 95.63%. (ISUP Grade 1). (ISUP Grade 5) however, the model MobileNet V2 has the lowest accuracy percentage in different ISUP grades (Table 12).

Table 11 Training and Validation results of various Models (ISUP_grade wise)
Table 12 Prerequisites for model training

5.1 Extensions

This section summarises the findings from five deep transfer learning models. We looked at several configurations to identify the most efficient settings across all CNN architectures. This section also includes the average results (training and validation) as well as line graphs showing the learning curves for each epoch's training and validation datasets. InceptionResNet V2 has obtained the highest average accuracy, loss and RMSE (training results), 87.62%, 0.276,0.525357 while NasNetMobile got the least accuracy, loss and RMSE score 81.66%,0.388,0.622896 as shown in Table 13. As seen in Table 14, again InceptionResNet V2 has obtained the highest average accuracy and loss (validation results), 84.99%, 0.399 while NasNetMobile got the least accuracy, loss and RMSE score 75.13,0.510 and 0.714143. Table 12 displays the prerequisites for model training, including the necessary tools and libraries.

Table 13 Average Results (Training)
Table 14 Average Results (Validation)

Using line graphs, Fig. 21 depicts the learning curves for the accuracy of several deep learning architectures. It is considered a good fit if a model can generalize and learn properties from training data without overfitting or underfitting. This indicates that the model can generalize the traits and perform well even on previously unseen data. Unfortunately, as seen in Fig. 12, NasNetMobile cannot generalize the features. Consequently, while the model works well on training data, with an accuracy of more than 80%, it is incapable of producing equivalent results on test data, as seen by the variation in the validation accuracy curve graph and achieved 75.13 validation accuracy.The loss function is used to assess how the model's predicted value varies from the actual value. the better the loss function, the better the model's performance. Consequently, the loss functions used by different models are often distinct. The diagram exhibits line graphs depicting the loss incurred by different transfer learning models on training and testing data during each epoch. NasNetMobile has achieved the lowest loss function score (Fig. 22).

Fig. 21
figure 21

Accuracy curves of five models

Fig. 22
figure 22

Loss curves of five models

Furthermore, Fig. 23 shows the average score of RMSE as DenseNet 169 has achieved the highest, whereas NasNetMobile has again obtained the least RMSE score.

Fig. 23
figure 23

Average score of RMSE values

6 Conclusion

Conventional techniques for comparing cancer samples to standard samples are often not significant; it has been noted in most instances. However, the use of convolutional architectures for processing complicated pictures has risen during the last ten years. Therefore, computational pathology helps diagnose, predict, and treat diseases. Additionally, several pathologists and medical professionals have acknowledged the sophisticated use of ML techniques. This research demonstrates effective categorization of prostate cancer data using Deep Learning (DL) techniques. The DenseNet169 is the top DL net for classifying cancer data according to the results of several transfer learning models. The most important advantage of implementing an automated mechanism is that it can systematic way handle a large number of image datasets without the potential for bias brought on by the typical exhaustion encountered by pathologists. This may substantially prevent high clinical burden of practical pathology prognosis performed using traditional microscopes. Even if the results are positive, more clinical validation research must be conducted before the models may really be used in clinical practise. This is done to further assess the models' resilience in a possible clinical context. The promising findings show that edge-computing platforms and customized CNN designs may play crucial roles in AI-based medical imaging processing, assisting pathologists with their time-consuming jobs. Such models have the potential to revolutionize precision oncology and healthcare if they can withstand rigorous clinical validation. In the future, we will evaluate more approaches, develop a more intricate DL model, and compare it with other transfer learning methods.