Introduction

Cancer is an uncontrolled growth of the cells in the affected part of the body. Lung cancer is the one of the world's most prominent and dangerous malignancies. However, lung cancer detection at early stage enhances the survival rate. Tobacco use and smoking are thought to be responsible for 85 percent of lung cancer cases in males and 75% of lung cancer cases in women [1]. As per the reports, lung cancer has caused 2.1 million new cases and 1.8 million fatalities in 2018. Lung cancer comprises about 20% of the total deaths occurring due to cancer. As per medical reports, the early detection of tumors can decrease the risk of casualty due to lung cancer by about 20%. The five-year study shows that tumors detected at an early stage have survival rate of 56%. Lung growth is a condition in which abnormal cells proliferate and form a tumor in the lungs [1, 2]. Lung cancer frequently spreads to the middle of the chest. Metastasis occurs when a disease fraction leaves the area of origin and travels through the bloodstream to a lymph node or another part of the body [3]. Lung sarcomas, which include carcinoma, adenocarcinoma, and squamous faction carcinomas, are the most frequent kind of lung cancer. A chest radiograph and a computed tomography (CT) scan may be used to identify lung cancer. A CT examination bump appears as a rounded and bumpy thick number with a diameter of up to 30 mm on a CT scan. All skim contains hundreds of descriptions that have to be evaluated by a radiologist, which is a tiring process. Subsequently, for this reason, the CAD-based methodology is employed [4].

Most of the researchers have used the machine learning approach, so the accuracy is not 100%. But in medical diagnosis, the accuracy of diagnosis is a major concern. So, in the proposed work a lung cancer detection methodology is proposed whose algorithm is implemented with the hard coding. This research paper is divided in five sections, namely introduction, literature study, methods, result analysis, and conclusion.

Literature Review

Disha Sharma and Gagandeep Jindal developed a PC-aided diagnosis technique in 2011 using a CT scan to detect lung cancer cells with numerous steps [1]. The expected system includes image processing functions such as erosion, spatial filtering, dilation, outlining, and border extraction using high-pass filtering. This preprocessing employed was useful in the eradication of the blight region in the CT images. The radiologist and surgeons rate the accuracy of finding a plague mass on a CT imaging in the 2.0–7.0 mm range as 80%. As a replacement for thresholding used for grayscale to binary transformation, here a method of smidgen sketch slicing is used [2].

Shamala Terdale and K. Kulhalli devised a CAD method for lung cancer identification using ANN in 2012. Lung bumps greater than 2 mm in diameter can be detected using the traditional method for identifying lung protuberances. Extraction of lung state, segmentation with extracted lung, protuberance identification, piece extraction, and deep neural network are the five basic steps of the proposed CAD system. The ANN technique has been around for a long time to determine between honest and dishonest contestant nodules. The deep learning architecture has three levels in this system: the input layer, the concealed layer, and the output layer. The performance of the CAD system is calculated in terms of parameters such as specification, accuracy, and sensitivity in this piece. The major borough of hobby in this research provides a well-behaved segmentation in a short amount of time [5].

Harsha Bodhey and her collaborators devised a system in 2013 that can execute routine and reliable segmentation of lung CT images. This method escapes the radiologist's past lung illness detection from the CT chest imaging. The resolution of the CT image used here is 512*512. This method has the advantage of effectively splitting the lung lobe and is beneficial in challenging situations. A pre-processing technique was used to reduce the clutter and distortion in the image. The corruption protuberances that are investigated are found in the lung parenchyma, vessels, and bronchi. The mandatory segmentation of lung procedures is enhanced segmentation. The benefit of this strategy is that it does not need prior image order [4].

Naveed and his collaborators carried out lung cancer detection using an improved contour detection approach in 2013. Here, a new technology called contour detection with circumstance removal is devised that uses a density-based algorithm to identify bumps in the lung. This method is used to determine whether the underlying bump is a lump or a non-nodule. The CAD system's initial responsibility is identification, which involves locating the lung protuberance in the screening image. The second task is classification, which is divided into two stages: the first is to distinguish bump and non-protuberance, and the second is to determine if protuberance is benign or malignant. A novel approach for detecting lung sarcoma protuberance from a CT inspection is the contour detection strategy. The counter definition does not catch the minute range nodules. This flaw is overcome by modifying the assembly in the contour definition [6].

Anuradha Deshpande and her pupils presented an image-dealing approach for lung cancer screening using MRI and CT images in 2015. In this research, the CAD system detects rare cancer stages using the SVM (support vector machine) technique. The decisive moment for the segmentation is the segmentation skill. Overall, the system's performance is orchestrated using MATLAB software and a Graphical Client Edge (GUI). This method comprises four main steps, namely image acquisition, image pre-processing, segmentation, and lump Identification. Enhancement and the defining moment algorithm are among the strategies used to increase the survival ratio of lung sarcoma patients. Authors have generated a pull-up merged image when the vital information of a complicated image is mutually combined. This method improves the composite image characteristic in terms of high PSNR, cost gloomy MSE, and high precision of about 90.9011 [7].

Ruchika proposed a scheduled time algorithm for the diagnosis of lung illness from CT images in 2015. In this system, the scruffy reallocate process is used for segmentation purposes whose main purpose is to figure out the exact field and mold of the scourge nodule. The functional segmentation method used here requires the functional side detection skill for the detection of a lump in the lung. While performing image filtering and image segmentation, arithmetical proximity and photometric equality are taken consideration. With the help of this algorithm, the sundry regions of the CT scan left behind after segmentation can be detected. This method furthermore helps general practitioners to analyze the contour and magnitude of the growth region [8].

Aniket Gaikwad and his collaborators carried out the task of screening the CT scan images for lung cancer identification in 2016. Various steps involved in the identification of lung blight involve Image capturing, image enhancement, segmentation, and facet identification. The histogram equalization method is used to pre-process and classify images [9]. The CT scan picture is used as a key in this paper. The authors have used the division segmentation method to locate the boundary of the image & need to calculate certain parameters such as area, perimeter, and roundness, as well as eccentricity, to recognize portions and the appearances of the image.

S. Logesh Kumar and his collaborators predicted a novel image segmentation technique for the identification of lung cancer cells in 2016. The thresholding is the powerful badges for segmentation of images which converts a grayscale image to a binary image. The employed segmentation technique outperforms existing segmentation algorithms for lung cancer tumor detection. The pre-processing stage reduces the amount of blast and distortion in the input image. The low-pass filter had been employed in the past to smooth out the malignancy images. The morphological processing approach has been used to extract the nucleus appearance from the lung pictures. Then, the arithmetic means the usefulness of intensity is calculated from the extracted morphologically processed image [10].

Vaishali Patil and her collaborators proposed a novel method of lung cancer detection using a computed tomography scan image in 2016. They have made use of the CAD system for this diagnosis. Instead of using a traditional approach for image preprocessing, they have developed classifiers based on a machine learning approach. The algorithms, namely SVM, ANN, and kNN, were used to accomplish this classification task. This approach was found to improve efficiency and reduce the rate of errors [11]. Noriyasu Homma predicted that CAD technology will help to identify lung protuberance in its early parts of stages. The employed filter is used to locate inaccessible pulmonary nodules. This method has an added advantage that it can detect only isolated nodules, whereas non-isolated nodules associated with the chest blockage are not observed [12].

Methodology

Image processing deals with various mathematical operations on the image to extract some useful information from it. These operations involve image preprocessing, which is also called image cleaning, or image enhancement, and it removes the noise contained in the image. Then, the other operations involve image segmentation and feature extraction. In general, if the input is colored, then it is transformed into grayscale [13]. Then, by using thresholding, from the grayscale image the binary images are obtained. For each of these operations, some algorithms are available and the appropriate algorithm should be chosen to optimize the performance [2]. The various CAD techniques used for lung cancer has been reviewed in order to propose the experimental design [14]. This research work was carried out using MATLAB Image Processing Toolbox to perform all the processing on the image.

Implementation of Image Processing in MATLAB

The proposed work is implemented using MATLAB software and the LIDC dataset. The authors have used a total of 200 images from this dataset belonging to various categories such as non-cancerous images, as well as cancer images of first-, second-, or third-stage cancer. The images are available in.dicom format, and these have been converted to.jpg standard format. As in this proposed work, a traditional programming approach is used instead of machine learning, and 100% accuracy in lung cancer detection is obtained. For ease in visualization, GUI has been created that consists of buttons for various steps of implementation. Table 1 represents the summary of different functions used in the proposed work.

Table 1 Summary of functions used in the proposed work

Generally, the raw CT images are in grayscale, but still, to avoid any colored pixel that may occur due to scanning errors, the input image is converted to grayscale using the rgb2gray function. As the Binary pictures are less difficult to process than the grayscale images, the grayscale images obtained are binarized using the im2bw function. Finally, in pre-processing step, the denoising is carried out using median filtering carried out through the medfilt2 function.

The proposed system’s working can be divided into five types, namely

  1. A.

    Lung Region Extraction

  2. B.

    Lung Regions Segmentation

  3. C.

    Features Extraction

  4. D.

    Formulation of Detection

  5. E.

    Testing and Evaluation

Now the next section follows the discussion about the implementation modules in detail.

Lung Region Extraction

The first module of CAD system is lung region extraction. Various basic image processing techniques are used for lung region extraction purposes. The processes involved are—(1) bit plane slicing, (2) erosion, (3) median filtering, (4) dilation, (5) outlining, (6) lung border extraction, and (7) flood fill. Bit plane slicing is a process in which each plane of the image is separated for analyzing the importance of each bit of the image. This process is used for determining if a sufficient number of bits are used for quantizing each pixel. In morphological image processing, erosion is one of the two essential operations. Erosion is the compression or shrinking of an original image with the aid of a structural element. The primary effect of these morphological operations on the binary image is to erode the borders of the pixels. The primary rule of erosion is that the value of the pixel equals the least value of each pixel in the pixels neighborhood key. The median filtering comes next.

The median filter is a nonlinear digital filtering technique that is used to remove the salt and pepper noise. In this, the 3*3 mask is overlapped with the image pixels, and the product of pixel value and corresponding mask element value is carried out. Then, all 0 elements formed as a result of the multiplication are arranged in ascending or descending manner and the mid-value is chosen for the first pixel of the new output image. This filter will not introduce a blurring effect at the edge of the image. Then, the mask is shifted by one element toward the right and the same operation is continued further.

Another important procedure in morphological image processing is dilation, which is exactly opposite of the erosion. The primary role of dilation is to find the greatest value of all the pixels in the neighborhood and assign that value to the pixel concerned. If any of the pixels in a binary image are assigned to the value 1, the output pixel is also set to 1.

$${\text{Border}} = \left( {A - A\;{\text{Erosion}}\;B} \right)$$

The last procedure of lung region extraction is flood fill, also known as seed fill or boundary fill. It is a multi-dimensional array technique that calculates the region associated with a given node. It is used to fill connected, similarly colored areas with a distinct hue, and in some games to figure out which pieces are cleared. Finally, the lung region is extracted with the help of all the above morphological methodologies.

Lung Regions Segmentation

The next step is lung region segmentation, which seeks to find candidate nodules after the lung region has been recognized. This will create a new area of interest (ROI) that will aid in the detection of the cancerous region. The division of an image is known as segmentation. The term "segmentation" implies to the process of identifying an image's objects and borders (Fig. 1).

Fig. 1
figure 1

Lung region extraction method

The raw CT scan image is segmented to retrieve the lung area. In medical imaging, image segmentation is a very crucial task as the incorrect segmentation may have a significant impact on algorithmic accuracy for detecting lung nodules and diagnosing lung illnesses. There are various segmentation algorithms available, out of which some segmentation algorithms are semi-automated and require some user participation, while others are entirely automated and just require human verification. In the present study, the authors have used a method of global thresholding as well as a guided 3D watershed transform for lung lobe segmentation which produces promising results.

Feature Extraction

The feature extraction is a very much critical step in the image processing which is meant to find the part of an image that is sought. The classification procedure is based on the extracted features. These characteristics are utilized to create diagnostic guidelines for cancer nodule detection. The number of impacted circles, tumor area size, and total time elapsed in seconds are the features used in this investigation. Circles with varying boundary colors were obtained after the segmentation process. Cancer-affected areas are represented by circles. The total number of circles represents the total number of circles that have been afflicted by lung cancer.The area in which the number of affected circles is present is given by the area size of the tumor. This area shows how much area is affected by cancer nodules. It calculates the total area of affected nodules. The last feature is time. The time to complete the entire process is given by total time.

Formulation of Detection

After the necessary features are extracted, the following detection rules are applied. These rules are applied to the segmented lung region.

Rule 1

If the number of circles is smaller than or equal to 10 (i.e., number of circles <  = 10), or the total area affected is between 500 and 1000 sq., mm. then a small tumor is detected in the lung region. That means the cancer is at its initial stage, known as stage 1.

Rule 2

If the number of circles is greater than 10 and smaller than or equal to 15 or the total area affected by cancer is between 1001 and 5000 sq. mm, then stage 2 lung cancer is confined.

Rule 3

If the number of circles is greater than 15 and smaller than or equal to 20 or the total area affected by cancer is between 5001 and 8000 sq. mm, then stage 3 lung cancer is confined.

Rule 4

If the number of circles is greater than 20 or the total area affected by cancer is between 8001 and 10,000 sq. mm, then stage 4 lung cancer is detected. This cancer spreads throughout the body from the chest.

Testing and Evaluation

The performance of the presented methodology of lung cancer identification is found in this step. The detailed result is presented in the following section.

Result Analysis

For the experimental work, the publicly available dataset on the internet, known as the LIDC dataset, is selected. This dataset consists of many images belonging to different categories like non-cancerous lung, first-stage cancer, second-stage cancer, third and fourth-stage cancer, etc. As shown in Table 2 sample selected for the experimental work consist of each of the mentioned categories of the lung cancer. Keeping a dynamic real-time database will aid in studying changes in lung cancer cases over time.

Table 2 Sample details of lung cancer

Figure 2 shows the results of processing of sample 1 belonging to the CT scan of the non-cancerous lung. The results of this image analysis are represented in Table 3, where no affected circles and no detected tumors are observed, which confirms that the lung cancer is not detected. This result was confirmed with five more non-cancerous images and every time the correct identification by this proposed algorithm was obtained.

Fig. 2
figure 2

Results of Sample no. 1. a Selected image, b noise removed from selected image, c binary image of the original image, d opening image, e inverted image, f object removing, g broad detection, h detection part

Table 3 Parameters of sample 1

Figure 3 shows the results of processing of sample no. 2 which belonged to the CT image of first-stage lung cancer. The results of this image analysis are represented in Table 4, where it can be seen that the affected circles are 8 and the area size of the tumor is 700. Hence, initial stage of stage 1 cancer is detected in the lung.

Fig. 3
figure 3

Results of Sample no. 2. a Selected image, b noise removed from selected image, c binary image of the original image, d opening image, e inverted image, f object removing, g broad detection, h detection part

Table 4 Parameters of sample 2

Figure 4 represents the results of the processing of sample no. 3 which belonged to the CT image of second-stage lung cancer. The results of this image analysis are represented in Table 5, where it can be seen that the number of affected circles is 1 and the area size of the tumor is 1175. Hence, stage 2 cancer is detected in the lung.

Fig. 4
figure 4

Results of Sample no. 3. a Selected image, b noise removed from selected image, c binary image of the original image, d opening image, e inverted image, f object removing, g broad detection, h detection part

Table 5 Parameters of sample 3

Figure 5 represents sample no. 4 which belonged to the CT image of third-stage lung cancer. The results of this image analysis are represented in Table 6, showing the affected circles are 1 and the area size of the tumor is 3971. Hence, stage 3 cancer is detected in the lung.

Fig. 5
figure 5

Results of Sample no. 4. a Selected image, b noise removed from selected image, c binary image of the original image, d opening image, e inverted image, f object removing, g broad detection, h detection part

Table 6 Parameters of sample 4

Figure 6 shows sample no. 5 which belonged to the CT image of fourth-stage lung cancer. The results of this image analysis are represented in Table 7, where it can be seen that the affected circles are 26 and the area size of the tumor is between 9005. Hence, stage 4 cancer is detected in the lungs, and this cancer spreads from the chest to different bodily parts.

Fig. 6
figure 6

Results of Sample no. 5. a Selected image, b noise removed from selected image, c binary image of the original image, d opening image, e inverted image, f object removing, g broad detection, h detection part

Table 7 Parameters of sample 5

Figure 7 shows the implementation results obtained in the design GUI using MATLAB. It represents various steps of implementation such as denoising, morphological operation, and detection. It also shows that stage 3 of lung cancer is detected for a selected input image.

Fig. 7
figure 7

MATLAB GUI with output

Table 8 shows that the maximum accuracy of lung cancer detection obtained by previous researchers is 98% and the proposed methodology gives 100% accuracy.

Table 8 Comparative result analysis of lung cancer detection

Conclusion

The proposed computer-aided diagnosis technique is unique in terms of the novelty of the algorithms and the accuracy of the results. From the results obtained, it can be claimed that the proposed methodology is 100% accurate and there are no false positive or false negative observations recorded. The findings of this study should aid doctors in improving lung tumor early detection. This computer-aided diagnostic is rapid, hassle-free, and comfortable for the patients, and it does not cause any additional harm to the patients. Although currently there is a trend of using the deep learning-based algorithm for the detection of lung cancer, the accuracy of those algorithms is less than 100%. However, with this proposed methodology 100% accuracy in lung cancer detection was obtained because of the use of the hard coding algorithm.