1 Introduction

According to global cancer statistics for 2020, Lung cancer has been the most common and lethal oncological illness in the world for many years. According to clinical research, if late-stage patients had been diagnosed and treated earlier, their survival rate within five years would climb to 52% from the current 10% to 16% range. Lung nodules are a key indicator of early-stage lung cancer, which can be evident on CT scans as localized, spherical shaped like lung shadows, and the size is no larger than 3 cm wide [1]. Notwithstanding, the nano size of lung nodules, their morphology, brightness, and other features are close to those of the vascular system and other tissues in the pulmonary parenchyma; therefore, physicians must carefully examine and screen each nodule individually; this procedure is cumbersome and easily leads to exhaustion, thereby increasing the chances of diagnostic errors. Therefore, it is vital to build an automatic detection method to assist physicians in improving the performance and accuracy of lung nodule diagnosis [2]. On grounds of these details, the movement in medicine toward Computer-Aided Diagnosis (CAD) systems, which are subject to quantitative analysis of CT lung images, can improve Lung CT image understanding, disease diagnosis, and detection of small malignant nodules (which are difficult for a clinician to notice), and diagnostic time [3]. The latest generation of CAD systems also helps in the screening process to detect Lung Nodules differentiating between benign and malignant nodules. CAD uses the Artificial Intelligence (AI) algorithm of Deep Learning methods that efficiently leverage object detection. Deep learning is a robust technique in machine learning in which the object detector automatically learns the image characteristics required for computer vision tasks. There are several available algorithms for object detection using deep learning, including Faster R-CNN, you only look once (YOLO), and single shot detection (SSD). Mask R-CNN is an improved version of Faster R-CNN [4] and it simultaneously generates a high-quality segmentation mask for each detected object in an image. It incorporates the two-stage object detection techniques, in the first stage, RoI is predicted using the Region Proposal Network (RPN), and in the second stage, the class and box offset values are predicted in parallel, also producing a segmentation mask for each RoI.

Any CNN model's performance is impacted by various factors, including the size of the dataset, the number of classes, the model's weights, hypermeters, the optimizer, and many others. Optimizing hyper-parameter plays a vital role during the training of Convolutional neural networks [5]. To what extent a convolutional neural network performs well relies on its architecture and the values of its hyper-parameters. CNN includes many hyper-parameters, based on the structure and training such as the number of convolution layers, the number of filters, the size of each filter, Batch size, Learning Rate, momentum, etc. [6, 7]. As not in the model parameter, hyper-parameter tuning can be done manually but this is a tedious and time-consuming process. We can use the automated tuning method to optimize the hyper-parameter [8] to overcome this. Nowadays Modern optimization techniques, namely the heuristic and metaheuristic algorithms are applied for optimizing objective functions. However, the heuristic methods have numerical inefficiency in the search process, like high dimension problems, which leads the complicity in the model [9]. To address this, meta-heuristics and Swarm Intelligence (SI) methodologies and variants were proposed to handle a variety of flexible real-world optimization tasks and address complex/large-scale optimization issues [10].

The key benefit of the SI optimization algorithms over the deterministic approach is the randomization introduced throughout the search phase can get stuck in circumstances with no global ideal solution. Therefore, obtaining the global best solution is practically significant in SI [11]. The Particle Swarm Optimization (PSO) algorithm is one of the variants of SI proposed by James Kennedy, It is used in many real-world applications like integrating optimization of nonlinear functions and training of neural networks [12] The idea of flying potential solutions across hyperspace while speeding toward "better" solutions is exclusive to the concept of particle swarm optimization. In PSO, to achieve the optimal solution, it uses the global best it’s possible to become stuck in the local optima [13], which has the problem of premature convergence and doesn’t provide diversity, to avoid this we can use a trajectory-based technique called snake swarm optimization to avoid in stuck with the local optima. Snake swarm optimization works based on the behavior of snakes, which can be incorporated with MRCNN to attain a certain level of accuracy with a significant speedup in computing [14, 15]. So hybrid PS2OA optimization technique can better seek the global optimal solution and successfully prevent particles from remaining at the local optimum [16]. The main contribution of the proposed approach is listed below;

  • Initially, the Lung CT scan images are collected from the dataset and to remove the noise present in the input image, we apply an adaptive median filter. To enhance the image quality, CLAHE is applied.

  • After pre-processing, the image is given to the optimized mask RCNN classifier to detect the malignant and benign nodules.

  • To enhance the performance of the Mask RCNN classifier, the hyper-parameters are optimally selected using hybrid particle snake swarm optimization algorithm (PS2OA). The proposed PS2OA is a hybridization of particle swarm optimization (PSO) and snake swarm optimization (SSO).

  • The performance of the proposed approach is analyzed based on different metrics and effectiveness compared with state-of-the-art works.

The rest of the paper is structured as follows: Sect. 2 discusses related work; Sect. 3 discusses the proposed model in detail; Sect. 4 discusses the experimental results and computational performance metrics, and Sect. 5 discusses the proposed work's conclusion and future directions.

2 Related work

In recent years, algorithms that use Deep Learning techniques to detect lung nodules have been used a lot in medical research. Sunyi Zheng et al.[17] developed a deep learning model to locate lung nodules by taking into account the sagittal, coronal, and axial slices of the CT images for the lung region is been evaluated. Here, the system is made up of two parts. First, a supervised encoder-decoder is trained to find the nodule by combining these slices. Multi-scale contextual information is extracted using 3D Dense CNN to get rid of the nodules. Reza Majidpourkhoei et.al. [18], proposed A novel deep-learning framework based on CNN to detect lung Nodules. This framework is designed using light-footed CNN based on the LeNet-5 model. The lung nodule images are processed and drawn on a patch basis. This model takes six hours to train, which is a time-consuming process. Ying Su et.al., designed the framework for detection of Lung Nodule detection using Faster R-CNN [19] and stated that optimizing the training parameters like learning rate and batch size improves the accuracy of the detection and also the dataset size is enhanced by including the medium lung nodule and by taking into account the upper and lower nodules of the larger nodule that was identified on the CT slice. In this design, the parameters have to be tuned manually.

Menglu Liu et.al. [20], proposed segmentation of the lung nodule using Mask-R.CNN employs instance segmentation, this model is compared with the U-Net, and Mask-R-CNN outperforms in segmentation. Linqin Cai et al., [21] demonstrated pulmonary nodule detection based on the Mask-R-CNN and with a ray-casting volume rendering algorithm. where Mask-R-CNN helps to detect the pulmonary nodule by multiplying the mask matrices and sequences of raw medical images, and the ray-casting technique aids in visualizing the nodule in a 3D model. Here detecting the small nodule’s accuracy has to be improved. These models are evaluated using LIDC-IDRI data sets, which is an open-source dataset of Lung Images, and widely for research purposes.

The Deep Learning model uses the CNN architecture for automatic feature extraction and diagnosis of Lung Images. CNN architecture consists of different hyper-parameters that must be optimized to improve performance. In recent days different metaheuristic optimization techniques have been proposed. Wei-Chang Yeh et.al. worked on Simplified Swarm Optimization (SSO) [22] combined with LeNet-5, where the author proposed the sequential Dynamic Variable Range (SDVR), in contrast to typical SSO the feasible range of the next variable, which is determined by the present variable's value. LeNet-SSO architecture has improved the quality of the solution by tuning parameters and items. This system is evaluated using MNIST, Fashion MNIST, and Cifar10 datasets. It outperforms when compared with the other metaheuristic algorithms with LeNet. Singh et.al. proposed a Hybrid MPSO algorithm, which uses multiple swarms in two levels to give a better solution for the objective function [23]. the architecture of the CNN and hyper-parameters are optimized at level 1 and level 2 respectively. To modify the exploration and exploitation characteristics of particles and prevent the PSO algorithm from prematurely converging into a local optimum solution, this technique employs sigmoid-like inertia weight. This system is evaluated using different benchmark datasets like Cifar10, Cifar-100, MNIST, Covexset, and MDRBI and it is outperformed when compared to randomly generated CNN.

Vijh et al., designed a hybrid bio-inspired algorithm [24] for automatic lung nodule detection. Here a novel variant of whale optimization and adaptive PSO (WOA-APSO) is used to optimize the feature selection and the selected features are grouped by employing the linear discriminant analysis, which aids the reduction of dimensions spaces. the lung images are enhanced using a wiener filter and by employing the different segmentation techniques RoI of the Lung region is obtained, this system is evaluated using LIDC datasets, the accuracy, sensitivity, and specificity are 97.18, 97, and 98.66 respectively. The PSO has the downside of easily falling into local optima in high-dimensional space and having a slow convergence rate in iterative processes, despite being well suited for non-linear complex problems. to avoid the pitfalls of premature optimization and to avoid being stuck in neighborhood searches. we adopt another meta-heuristic approach called snake swarm optimization for local search. Gunjan et.al., proposed work on analyzing different metaheuristic algorithms namely Simulated annealing [25], Tree-of-Parzen estimator, and Random search for optimizing the CNN structure hyper-parameter to classify the small pulmonary lung nodules. Here the system uses the LIDC datasets, and the results show that the SA performs well when compared to other metaheuristic algorithms. Sollini et al. [36] explained the lung lesions classification using a deep learning algorithm. The two main modules are the detection of lung nodules on CT scans and the classification of each nodule into benign and malignant types. Computer Aided Diagnostics (CADe) and Computer Aided Diagnostics (CADx) modules rely on deep learning techniques such as Retina U-Net and Convolutional Neural Networks.

3 Proposed lung nodule detection methodology

The main objective of the proposed methodology is to effectively detect the pulmonary nodule from the CT lung images. To achieve this objective, in this paper we proposed, an optimized MRCNN. The proposed MRCNN is enhanced by using a hybrid particle snake swarm optimization algorithm (PS2OA). The hybrid PS2OA algorithm is used to tune the hyper-parameter. The proposed approach consists of two main stages namely, pre-processing and detection. The pre-processing is done by adaptive median filter and CLAHE. After pre-processing the image is given to the Optimized MRCNN, in which the nodule is detected. The structure of the proposed methodology is given in Fig. 1.

Fig. 1
figure 1

Workflow of proposed methodology

3.1 Pre-processing

The original Lung CT images are pre-processed to remove the noise and enhance image contrast. For this, we apply Adaptive Right Median Filter (AMF) and Contrast Limited Adaptive Histogram equalization (CLAHE) methods.

3.1.1 Adaptive median filter (AMF)

The purpose of using the Adaptive median filter is to reduce the distortion and preserve the image edge details [28]. The advantage of utilizing an adaptive median filter over a standard median filter is that the kernel size is adjustable in the area around the distorted image, as a result, we can obtain better output and in contrast to the median filter, it will not replace all of the pixel values with the median value. This algorithm works on two levels.

Level: 1

The first level involves determining the kernel's median value.

P1 = Zmin- Zmed

P2 = Zmax- Zmed

If P1 > 0 and P2 < 0 go to Level 2

Else increase the kernel size

If kernel size <  = Smax iterate level 1

Else output Zxy

Sxy—The local region of the gray level image at x,y.

Zmin, Zmax—Minimum and maximum gray level value in Sxy.

Zmed—Median gray level value in Sxy.

Zxy—Gray level coordinates at x,y

Smax—the maximum allowed size of the region Sxy.

Level: 2

In level 2 determine whether the current pixel value is an impulse (salt and pepper noise) or not. If a pixel's value is corrupted, it either modifies it using the median or keeps the grayscale pixel value.

Q1 = Zmed—Zmin

Q2 = Zmed—Zmax

If Q1 > 0 and Q2 < 0 output Zxy

Else return Zmed

Here the original CT Lung image of size 224 × 224 is shown in Fig. 2 (a) and the Smax maximum window size is assigned as 11. First, it is converted into a grayscale image as depicted in Fig. 2 (b). Then the AMF method is employed on that image for denoising, which is shown in Fig. 2 (c).

Fig. 2
figure 2

Pre-processing output

3.1.2 Contrast Limited Adaptive Histogram Equalization (CLAHE)

Histogram equalization is a method of processing images that modifies the intensity distribution of the histogram to change the contrast of an image. CLAHE is the variant of Adaptive Histogram Equalization (AHE). It reduces amplified noise by limiting contrast amplification. It performs this by evenly spreading the portion of the histogram that exceeds the clip limit across all histograms.

Histogram equalization (HE) is a technique that is often employed in image enhancement approaches; however, HE raises contrast globally, [29] whereas AHE is a technique that improves contrast in the local area. Unfortunately, AHE happens infrequently, which raises the contrast. The CLAHE method can handle this by providing a clip limit that specifies the maximum height of a histogram and region size. Here the denoised images are enhanced by having the clip limit as 0.02 and the tile size is assigned as 8X8. Clip limit gives the contextual region of the CLAHE [30]. The Rayleigh distribution method is used here to enhance the intensity values in every pixel. The bilinear interpolation method is used to remove the artifacts near the boundary of the tiles. HE is based on a transformation function, which is a combination of a probability distribution function (PDF) and a cumulative distribution function (CDF). The general histogram stretching is given by Eq. 1,

$${{\text{P}}}_{{\text{out}}}={{\text{P}}}_{{\text{in}}}-{{\text{I}}}_{{\text{miin}}}\left(\frac{{\text{Omax}}-{\text{Omin}}}{{\text{Imax}}-{\text{Imin}}}\right)+{{\text{O}}}_{{\text{min}}}$$
(1)

where Pout and Pin are the pixel value of the input image, Imin, Imax Omin, and Omax is the input and output images' respective minimum and maximum intensity levels.

$${\mathrm{PDF}}_{\mathrm{Rayleigh}}\begin{array}{cc}=\frac x{a^2}e^{-\left(\frac{{-x}^2}{{2a}^2}\right)}&\mathrm{for}\;\mathrm x\geq0,\;a\geq0\end{array}$$
(2)

where x is the intensity value of the input image and α Rayleigh distribution parameter. Figure 3 depicts the Input CT lung image and Fig. 4 shows the image after applying the CLAHE technique. Figures 5 and 6 show the plot of Histogram Equalization and CLAHE.

Fig. 3
figure 3

Input CT Lung Image

Fig. 4
figure 4

CLAHE Image

Fig. 5
figure 5

HE plots of Input Lung CT

Fig. 6
figure 6

Plot of CLAHE

3.2 Lung nodule detection and classification using optimized Mask RCNN

After pre-processing, the pre-processed images are sent into an optimized Mask RCNN classifier to classify an image as malignant or benign. Mask R-CNN is a deep learning-based approach that is mainly used for object detection and image segmentation. In many computer vision tasks, including object detection, instance segmentation, and pose estimation, the Mask R-CNN algorithm has been extensively used. The mask RCNN generates the bounding box, segmentation mask, and corresponding class name. This works based on the Feature Pyramid Network (FPN) and a ResNet101 backbone. To enhance the performance of Mask RCNN, the hyper-parameter present in the Resnet is optimally selected. For the parameter selection process, a hybrid optimization algorithm is presented. For hybridization, particle swarm optimization and snake swarm optimization are presented. The structure of Mask RCNN is presented in Fig. 7.

Fig. 7
figure 7

Architecture of mask R-CNN

The Mask R-CNN model consists of three primary components which are the backbone, the Region Proposal Network (RPN), and RoIAlign. Backbone is a Feature Pyramid network-style deep neural network that can extract multi-level image features. The ResNet forms the backbone of the Mask R-CNN model. The CNN used here is ResNet 101. Further, it has 3.8 × 109 floating point operations. The RPN uses a sliding window to scan the input image and detects the infected regions in this study. The RoIAlign then examines the RoIs obtained from the RPN and extends the feature maps from the backbone at various locations. The RoIAlign is responsible for the formation of the precise segmentation masks on the images. The RoIPooling in Faster R-CNN is replaced by a more precise and accurate segmentation using the RoIAlign.

3.2.1 ResNet 101 + FPN-based feature map generation

ResNet-FPN is a backbone architecture used for feature extraction in Mask R-CNN. ResNet is a deep convolutional neural network that is very effective for image classification tasks. While FPN builds an in-network feature pyramid out of a single-scale input, it uses a top-down architecture with lateral connections. In ResNet-FPN, FPN architecture is added on top of the ResNet backbone to create a more effective feature extractor. The FPN component allows for multi-scale feature maps to be generated from the input image, which can improve object detection accuracy [14].

ResNet sets up a series of convolution, polling, and activation FC layers one after the other. There are many types of ResNet architectures available, in this paper, ResNet 101 is utilized. ResNet 101 consists of 101 layers. The proposed ResNet 101 has lower complexity compared to VGG16 and VGG19 nets [11]. ResNet has three versions namely, ResNet Version 1, ResNet Version 2, and ResNeXt. Each version has different characteristics.

As shown in Fig. 8, the ResNet-101 has a bottom-up path, which reduces the resolution of the feature image. In contrast to ResNet-101, FPN improves the resolution of feature images from the top down. Lateral links between ResNet-101 and FPN combine features with the same resolution from ResNet-101 and FPN, respectively, to create new features in FPN [10]. In this, two features with the same resolution from ResNet101 and FPN combine to create a new feature in the path of FPN, and the ResNet101 backbone with FPN is used to train models for Lung CT images. The ResNet-101 is trained to optimize the following parameters;

  • ResNet Version: ResNet 101 consists of a number of versions. The best version gives the proper output. So, we select the optimal version for the segmentation process.

  • Batch Size: We can select any size of batch for processing. This will affect the performance. So, we choose the optimal batch size.

  • Pooling type: Different types of pooling are available. So, we chose the optimal one.

  • Learning rate: For maximizing the final accuracy, the learning rate is another crucial factor. It can be difficult to determine the proper learning rate.

  • Optimizer: The optimizer is used in the fully connected layer.

Fig. 8
figure 8

ResNet-101 + FPN model

The above-mentioned five parameters are optimally selected by using the PS2OA. The range of overlay parameter configurations used for the PS2OA algorithm is shown in Table 1.

Table 1 Hyper-parameter range

Step 1: Solution encoding: Solution initiation is an important factor in the PS2OA, which is used to define the problem. In this paper, random initialization is used. Here, the parameters present in the ResNet101 namely, the version of ResNet, batch size, pooling type, learning rate, and optimizer are optimally selected by using the PS2OA algorithm. Initially, these parameters are randomly initialized. In the PS2OA, the solutions are called swarms, and the parameters are called particles. The initial solution format is given in Eq. (3).

$${S}_{i}=\left\{{S}_{1},{S}_{2},\dots .,{S}_{n}\right\}$$
(3)

where \({S}_{n}\) represent the \({n}^{th}\) swarm.

Step 2: Fitness calculation: After the random creation, the fitness function is calculated for each swarm. In this paper, maximum accuracy is considered as the fitness function. The fitness function is given in Eq. (4).

$$Fitness=Max\left(Accuracy\right)$$
(4)

Step 3: Update the solution using hybrid Particle Snake Swarm Optimization: After the fitness calculation, we update the solution. For updating, in this paper PS2OA algorithm is used. PS2OA is a combination of PSO and SSO.

3.2.2 Particle Swarm Optimization

PSO is initialized based on fish schools and bird swarms in nature with a swarm intelligence method. Every velocity vector and position vector in PSO is defined as a particle. Each particle has its traverses and conducts a search space aligned with the best solution. Particles already know the optimal location the complete particle swarm has computed. The location and velocity vector updating process is formulated as follows,

$${v}_{i}^{k+1}={v}_{i}^{k}+{c}_{1}{r}_{1}\left({pbest}_{i}^{k}-{x}_{i}^{k}\right)+{c}_{2}{r}_{2}\left({g}_{best}-{x}_{i}^{k}\right)$$
(5)
$${x}_{i}^{k+1}={x}_{i}^{k}+{v}_{i}^{k+1}$$
(6)

3.2.3 Snake Swarm Optimization

Snake swarm optimization was developed in 2022 to reduce the mating characteristics of snakes [35]. Mating is achieved when food is available and at low temperatures.

Stage 1: Initialization Phase:

The random initial population of the SSOA is presented as follows,

$${N}_{male}\approx \frac{n}{2}$$
(7)
$${N}_{f}=n-{N}_{male}$$
(8)

Here, \({N}_{f}\) is defined as female individuals, \(n\) is defined as the number of individuals and \({N}_{male}\) is defined as male individuals. The random initialization of the SSOA is divided into two clusters such as male and female. In every iteration, the optimal individual candidate solution is computed by validating every group for optimal female and optimal male. The food quantity and temperature are described as follows,

$$t=exp\left(\frac{-g}{t}\right)$$
(9)
$$fq={c}_{1}exp\left(\frac{g-t}{t}\right)$$
(10)

Here, \({c}_{1}\) is defined as constant to 0.5, \(t\) is defined as the total number of iterations, \(g\) is defined as the current iteration. When \(fq<threshold\), the snakes search for food by choosing a random position and after that upgrade their position.

Stage 2: Male snake formulation:

To numerically design the exploration characteristics of the female and male snakes, it is used,

$${x}_{i,j}\left(g+1\right)={x}_{\left(rand\epsilon \left[1,\frac{n}{2}\right],j\right)}\left(g\right)\pm {c}_{2}\times {a}_{i,male}\left(ub-lb\right)\times {rand}_{\epsilon U\left(\mathrm{0,1}\right)}+lb, Here {a}_{i,male}=exp\left(-\frac{{f}_{rand,male}}{{f}_{i,male}}\right)$$
(11)

Here, \(\pm\) is a flag direction operator, \({f}_{i,male}\) is defined as the fitness of the male in the group, \({f}_{rand,male}\) is defined as the fitness of the earlier chosen random male snakes, \({a}_{i,male}\) is defined as the capability to compute the food by male, \(rand\) is defined as the random number between 0 and 1, \({x}_{\left(rand\epsilon \left[1,\frac{n}{2}\right],j\right)}\) is defined as the position of a random male snake and \({x}_{i,j}\) is defined as the male snake position.

Stage 3: Female snake formulation

$${x}_{i,j}={x}_{\left(rand\epsilon \left[1,\frac{n}{2}\right],j\right)}\left(g+1\right)\pm {c}_{2}\times {a}_{i,female}\left(ub-lb\right)\times {rand}_{\epsilon U\left(\mathrm{0,1}\right)}+lb, Here {a}_{i,female}=exp\left(-\frac{{f}_{rand,female}}{{f}_{i,female}}\right)$$
(12)

Here, \(\pm\) is a flag direction operator, \({f}_{i,male}\) is defined as the fitness of the female in the group, \({f}_{rand,female}\) is defined as the fitness of the earlier chosen random female snakes, \({a}_{i,female}\) is defined as the capability to compute the food by female, \(rand\) is defined as the random number between 0 and 1, \({x}_{\left(rand\epsilon \left[1,\frac{n}{2}\right],j\right)}\) is defined as the position of a random female snake and \({x}_{i,j}\) is defined as the male snake position. Female snake formulation is given as

Stage 4: Exploration phase:

In the exploitation phase, two scenarios are considered for computing optimal solutions. This condition is developed based on threshold parameters.

4 Condition 1:

\(FQ<threshold\), it is updated by the below equation

$${x}_{i,j}\left(g+1\right)={x}_{food}\pm {c}_{3}\times t\times rand\times \left({x}_{food}-{x}_{i,j}(g)\right)$$
(13)

Here, \({c}_{3}\) is equivalent to 2, \({x}_{food}\) is defined as the position of the optimal individuals and \({x}_{i,j}\) is defined as the position of individuals.

5 Condition 2:

\(FQ>threshold\), it is updated based on the fighting and mating process.

Fighting Process.

The fighting capability of a female snake is formulated as follows,

$${x}_{i,j}\left(g+1\right)={x}_{i,j}\left(g\right)\pm {c}_{3}\times {f}_{i,female}\times rand\times \left({x}_{best,male}-{x}_{i,f}\left(g+1\right)\right) where\, {f}_{i,female}=exp\left(\frac{-{f}_{best,male}}{{f}_{i}}\right)$$
(14)

Here, \({f}_{i,female}\) is defined as the fighting capability of the female snake, \({x}_{best,male}\) is defined as the position of the best individual in the male group and \({x}_{i,j}\) is defined as the female position. The fighting capability of the male snake is formulated as follows,

$${x}_{i,j}\left(g+1\right)={x}_{i,j}\left(g\right)\pm {c}_{3}\times {f}_{i,male}\times rand\times \left({x}_{best,female}-{x}_{i,male}\left(g\right)\right) where\, {f}_{i,male}=exp\left(\frac{-{f}_{best,f}}{{f}_{i}}\right)$$
(15)

Here, \({f}_{i,male}\) is defined as the fighting capability of the male snake, \({x}_{best,female}\) is defined as the position of the best individual in the female group and \({x}_{i,j}\) is defined as the male position.

6 Mating process

In this phase, the female and male groups can upgrade their position,

$${x}_{i,female}\left(g+1\right)={x}_{i,f}\left(g\right)\pm {MM}_{i,female}\times rand\times \left(q\times {x}_{i,male}-{x}_{i,female}\left(g+1\right)\right), where\, {MM}_{i,female}=exp\left(\frac{-{f}_{i,male}}{{f}_{i,female}}\right)$$
(16)
$${x}_{i,male}\left(g+1\right)={x}_{i,m}\left(g\right)\pm {MM}_{i,male}\times rand\times \left(q\times {x}_{i,female}-{x}_{i,male}\left(g+1\right)\right), where {MM}_{i,male}=exp\left(\frac{-{f}_{i,female}}{{f}_{i,male}}\right)$$
(17)

Here, \({MM}_{i,female}\) is defined as the mating capability of a female, \({MM}_{i,male}\) is defined as the mating capability of male, \({x}_{i,m}\left(g\right)\) is defined as the position of male and \({x}_{i,f}\left(g\right)\) is defined as the position of female agents.

7 Proposed PS2OA

The major motive of the hybrid algorithm is to enhance the method's ability to utilize PSO while also exploring SSOA to achieve the optimization strength of both. The exploration and exploitation of the SSOA were managed by the inertia constant in the hybrid algorithm. Compared with the conventional computations, the primary agent’s location in the hunting location is optimally upgraded. This is presented as follows,

$$Female\, \left(mating\right)={x}_{i,f}\left(g\right)\pm {MM}_{i,female}\times rand\times \left(q\times {x}_{i,male}-{x}_{i,female}\left(g+1\right)\right), where\, {MM}_{i,female}=exp\left(\frac{-{f}_{i,male}}{{f}_{i,female}}\right)$$
(18)
$$Male\, \left(mating\right)={x}_{i,m}\left(g\right)\pm {MM}_{i,male}\times rand\times \left(q\times {x}_{i,female}-{x}_{i,male}\left(g+1\right)\right), where\, {MM}_{i,male}=exp\left(\frac{-{f}_{i,female}}{{f}_{i,male}}\right)$$
(19)

The location and velocity are adjusted to combined PSO and SSO variations and are presented as follows,

$${v}_{i}^{k+1}={w.(v}_{i}^{k}{+c}_{1}{r}_{1}\left({x}_{i,female}-{x}_{i}^{k}\right))+{c}_{2}{r}_{2}\left({x}_{i,male}-{x}_{i}^{k}\right)$$
(20)
$${x}_{i}^{k+1}={x}_{i}^{k}+{v}_{i}^{k+1}$$
(21)

Step 4: Termination criteria

The procedure is continued until the best hyper-parameter values are selected. The selected value is given to the lung cancer detection process.

Algorithm 1
figure a

Pseudocode of the proposed hybrid algorithm

8 Region proposal network (RPN)

The RPN network utilizes the features extracted by ResNet101+FPN as input to generate Regions of Interest (ROIs). In scenarios where the aspect ratios of objects differ, RPN can predict both the foreground and background of an image. To efficiently generate candidate regions, the image box is positioned on the network, delineating the border box in the expected feature image. A 3×3 convolutional layer scans the image, generating anchors distributed across the image in different sizes. These anchors serve as starting points for proposing potential regions of interest, facilitating subsequent object detection processes.

The network adapts its scale based on input images and utilizes a predefined set of anchor boxes [15]. Each anchor corresponds to a unique bounding box and ground-truth class, allowing for the recognition of defects of diverse sizes and shapes. Default bounding boxes encompass a range of sizes and aspect ratios to accommodate various object characteristics. With overlapping bounding boxes, determining the highest confidence score for detecting multiple Regions of Interest (ROIs) becomes more straightforward [20]. This assessment is facilitated by the Intersection over Union factor (IoU), calculated using equation (22), aiding in the accurate identification of RoIs.

$$IoU=\frac{AreaofOverlap}{Areaofunion}$$
(22)

9 ROI align model

RoIAlign processes a set of rectangular region proposals, extracting features from the feature map corresponding to each proposal. In RCNN networks, pixel accuracy and the ability to distinguish individual branches within the same pixel target are crucial for mask branch detection. After pooling and convolution of the original image, the image size undergoes changes, followed by segmentation. Direct pixel-level segmentation techniques often fail to produce accurate segmentation output. Hence, this paper proposes Mask RCNN, an enhanced version of Faster RCNN. Additionally, CNN's pooling layer is replaced with RoIAlign, which utilizes linear interpolation to preserve spatial details in the feature map. RoIAlign serves as a neural network layer employed in object detection and instance segmentation algorithms, such as Mask R-CNN.

In Fig. 9, the green dotted lines are referred to as the 5X5 feature diagram, which is derived after the convolution layer, and the feature corresponding to the ROI in the solid line feature diagram is smaller, and RoIAlign maintains a floating-point number boundary without scale processing. Initially, the feature's small volume was separated into 2X2 units (each unit boundary was not measured) and then each unit was separated into four smaller units; the center point is illustrated as a four-coordinate position blue dot in the figure. After that, two linear interpolations are performed to calculate the values of the four levels, followed by maximum pooling or average voting to generate a 2×2 scale feature map.

Fig. 9
figure 9

Schematic diagram of the RoIAlign algorithm

10 Loss function

The loss function in Mask R-CNN, a popular instance segmentation model, is a composite function that combines classification, bounding box regression, and mask segmentation losses. It serves to optimize the model parameters by minimizing the discrepancy between predicted and ground-truth values for object classification, localization, and pixel-wise segmentation simultaneously. This loss function plays a pivotal role in training Mask R-CNN to accurately identify object instances and their corresponding masks in images. The loss function of the proposed model is given in Eq. (23).

$$L\left(OMRCNN\right)=Loss_{Class}+Loss_{Box}+Loss_{Mask}$$
(23)

where; prediction loss of the presented class label is represented as \(Loss_{class}\), the loss of bounding box is represented as \(Loss_{Box}\), and the presented segmentation mask loss is represented as \(Loss_{mask}\). The \(Loss_{class}\) is calculated based on the normal image and affected image. The mathematical expression of  \({L}^{Class}\) is given in equation (24).

$$Loss_{class}\left({A}_{i},{A}_{i}^{*}\right)=-{\text{log}}\left[{A}_{i}{A}_{i}^{*}+\left(1-{A}_{i}\right)\left(1-{A}_{i}^{*}\right)\right]$$
(24)

where, Ai represents the candidate anchor i target prediction probability of having a disease and is the ground-truth label which is 1 for the positive anchor, otherwise 0. The below equation describes the regression loss of the bounding box function;

$$Loss_{Box}\left({B}_{i},{B}_{i}^{*}\right)=\sum_{i\in \left\{x,y,w,h\right\}}{Smooth}_{{L}_{1}}\left({B}_{i}-{B}_{i}^{*}\right)$$
(25)

where;

$${Smooth}_{L1}\left(X\right)=\left\{\begin{array}{ll}{0.5x}^{2},& if\left|x\right|<1\\ \left|x\right|-0.5,& otherwise\end{array}\right.$$
(26)

predicted bounding box is defined as \({B}_{i}\), the GT based positive anchor is defined as \({B}_{i}^{*}\). The loss function of the mask is calculated as below;

$$Loss{Mask}=-\frac{1}{{n}^{2}}\sum_{1\le i,j\le n}\left[{X}_{ij}\,{\text{log}}\,{O}_{ij}^{t}+\left(1-{X}_{ij}\right){\text{log}}\left(1-{O}_{ij}^{t}\right)\right]$$
(27)

where \({X}_{ij}\) represent the value of a pixel \(\left(i,j\right)\) in a ground-truth mask of size n x n and \({O}_{ij}^{t}\) is the predicted value of the same pixel in the mask learned for class \(\left(t=1\, for\,Malignant\, and\, 0\, for\, Benign\right)\).

11 Results and discussion

The experimental results obtained by the proposed approach are presented in this section. The proposed method is executed in TensorFlow and performance is analyzed. The analysis was executed on Google colab in “Keras 2.3.1” with the “TensorFlow 1. Upon which system the experiment was conducted. “Windows 10” and had a “Random-Access Memory (RAM) of 8 GB” and “Graphics Processing Units (GPUs)” are used in this experiment. The performance of the proposed approach is analyzed based on different metrics namely, accuracy, precision, recall, and F-Measure.

11.1 Dataset description

Seven research institutions and eight private medical imaging businesses have collaborated to create a public dataset called the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) [26]. Ten hundred and eighty computed tomography (CT) scans were included in the database, with slice thicknesses ranging from 0.6 mm to 5.0 mm. Four radiologists read through these scans in two separate reading sessions. First, radiologists identified potentially malignant lesions and divided them into three categories based on their size (nodules >  = 3 mm, nodules < 3 mm, and non-nodules). The results from all four radiologists were compiled, and then each radiologist unblinded and rechecked every annotation. In practical practice, detecting lung nodules requires scans with thin slices. Therefore, scans with slice thickness greater than 2 mm were not considered. So the author included a total of 888 images included in for the analysis, after excluding those with inconsistent slice spacing [27]. According to NLST screening criteria, nodules larger than 3 mm were judged to be significant lesions.

12 Experimental results

In this section, we presented the visual representation of the proposed experimental results.

Table 2 represent the visual representation of the detection output. In Fig. 10, we analyze the accuracy performance by varying epochs, and in Fig. 11, we analyze the loss s by varying epochs. According to Fig. 15, we understand that as the number of epochs increases, the loss value decreases.

Table 2 Visual representation of detection output, column (a) represent the input image, (b)represent the Gray image, (c) represent the adaptive median filtered image and (d) detected output
Fig. 10
figure 10

Epoch vs accuracy

Fig. 11
figure 11

Epoch Vs loss

12.1 Comparative analysis results

In this section, we compare our proposed work performance with different detection models namely, Faster RCNN (FRCNN), Single Shot MultiBox Detector (SSD), YoLo model and SVM-based lung nodule detection.

In Fig. 12, the performance of the proposed approach is analyzed based on an accuracy measure. When analyzing Fig. 12, the proposed method attained a maximum accuracy of 97.67% and ANN-based lung nodule detection attained an accuracy of 89%. Compared to five existing classifiers, SVM-based classification attained the worst results. Due to hyper-parameter optimization in MRCNN, our proposed method attained better results compared to the existing techniques. In Fig. 13, the performance of the proposed approach is analyzed based on precision. A good classification should have the maximum precision value. When analyzing Fig. 13, the proposed method attained the maximum precision of 95.7% which is 2.6% better than FRCNN-based lung nodule classification, 4.5% better than SSD-based lung nodule classification, 6.2% better than YoLo-based classification and 8.2% better than SVM based lung nodule classification. The performance of the presented technique is analyzed based on recall is given in given in Fig. 14. As per Fig. 14, we understand that ORCNN-based lung nodule classification attained the maximum recall value compared to the existing techniques. Similarly, we attained the maximum F-score value shown in Fig. 15. From the results, we can understand proposed approach attained the maximum output compared to the existing techniques.

Fig. 12
figure 12

Performance analysis based on accuracy

Fig. 13
figure 13

Performance analysis based on precision

Fig. 14
figure 14

Performance analysis based on Recall

Fig. 15
figure 15

Performance analysis based on F-score

12.2 Comparative analysis with published work

To prove the efficiency of our proposed approach, we compare our work with already published research works. For comparative analysis, we considered four research works namely 3DCNN [31], CNN [32], texture CNN [33], and BCNN [34]. These four techniques are deeply explained lung nodule classification. So, we compare our research work with these papers.

The comparative analysis result is presented in Table 3. In this paper, for lung nodule classification, we optimized MRCNN. To improve the performance MRCNN classifier, the hyper-parameters are optimally selected using a hybrid PS2OA algorithm. To prove the efficiency, we compare our work with [31,32,33,34]. When analyzing Table 3, our proposed approach attained the maximum accuracy of 97.67% which is 90.6% for [31], 87.26% for [32], 90.91% for [33], and 91.46% for [34]. Due to MRCNN and hyper-parameter optimization, our method produces superior classification outcomes.

Table 3 Comparative analysis results

13 Conclusion

In this section, the proposed methodology is effectively detecting the pulmonary nodule from the CT lung images. To achieve this objective, in this paper we proposed, an optimized MRCNN. The proposed MRCNN is enhanced by using a hybrid PS2OA. The hybrid PS2OA algorithm is used to tune the hyper-parameter. The proposed approach consists of two main stages namely, pre-processing and detection. The pre-processing is done by adaptive median filter and CLAHE. After pre-processing the image is given to the Optimized MRCNN, in which the nodules are detected. The performance of the proposed approach is analyzed based on different metrics and effectiveness compared with state-of-the-art works. The performance analysis of accuracy is 94.67%, the recall value is 99%, the precision value of the proposed method is 95.7% and the f-score value of the proposed method is 95.67%. Overall, this integrated approach holds significant promise in enhancing the efficiency and accuracy of lung cancer detection, thereby contributing to improved patient outcomes and advancing the field of computer-aided diagnosis systems.