Keywords

1 Introduction

Cancerous cells can form tumors, immune system destruction, and other breakdowns that obstruct the body from handling them correctly. Lung cancer is a malignant tumor of the lungs that leads to irrepressible tissue cell proliferation. It is the world’s most common disease, with 2,093,876 new cases reported in 2018. From 2005 to 2015, the research found that men’s lung cancer rates fell 2.5% per year, while women’s rates fell 1.2% per year. Symptoms incorporate a chronic cough, sputum striated with blood, chest pain, alterations in voice, increased conciseness of breath, and cyclical pneumonia or bronchitis, which normally do not appear until the malignancy has progressed [1]. One of the most valuable perilous components of lung cancer disease is still cigarette smoking which is most common in the world.

In the United States, burning weed is still responsible for 80% of accidents that are caused by lung cancer. Lung cancer is thought to be caused by exposure to radon gas, which is emitted through soil and construction materials [2]. For much better picture recognition which leads to accurate predictions, we are employing the InceptionV3 convolutional neural network architecture. Our major goal is to use attributes and information to diagnose the presence of lung cancer cells. The main things that should be considered while diagnosing to check for lung cancer are the attributes. The research checks for the feasibility of using an Artificial Neural Network model to identify lung cancer in an individual’s body. The purpose of this research is to identify some significant factors that can cause lung cancer, make a neural network model that can be used for detection of lung cancer and to also know the cancer stage if it is present like malign, benign, or normal.

The main aim to be achieved from cancer screening is to decrease the count of demises caused by lung cancer or to get rid of cancer deaths completely. Screening is being applied to have benefits in clinical practice instead of in a research trial setting. False positives, cost, unintentional results, radiation exposure, and overdiagnosis are all issues that need to be addressed. Despite the truth that CT scan imaging is a substantial image depicting tool in the medical domain, it is difficult for clinicians to interpret and identify cancer from CT scan images, and visual elucidation of these CT scan images may be a fallible task that causes lung cancer detection to be delayed. As a result, clinicians may find computer-assisted diagnostics useful in properly identifying malignant cells.

2 Related Works

Till now, many have tried to develop a neural network frequently aiming at increasing the accuracy of lung cancer diagnosis.

Bhatia et al. in [3] have performed preprocessing on those images to get the areas of the lungs that are sensitive to cancer. Feature extraction was done using UNet and resNet models. Then used a combination of XGBoost and random forest which was provided with extracted features for predicting cancer using the LIDC-IRDI database, the accuracy obtained was 84%. Makaju et al. in [4] used Watershed segmentation for detection of nodules and SVM for the purpose of classification of nodules into Malignant or benign. He improved the image quality by calculating the weighted mean function. He used the Lung Image database consortium as a database which gave an accuracy of 92%. Faisal et al. in [5] worked with a number of classifiers such as MLP, Neural Network, Decision Trees, Naive Bayes, and SVM with a database obtained from UCI. From all the examinations, he found Gradient-boosted tree works well repository which gave an accuracy of 90%. Alakwaa et al. in [6] used a CT scan dataset from DSB and implemented the CNN algorithm. He predicted sensitive areas using UNet architecture. His model gave an accuracy of 86.6%.

Abdillah et al. in [7, 8] used CT scan Images which are from VIA and ELCAP databases. Region growing, Marker Controlled Watershed, and Marker Controlled Watershed with masking are applied. And among these, one with highest accuracy and robustness was Watershed with masking method. Alam et al. in [9] the preprocessing was done by image enhancement, segmentation, Image scaling, color space transformation, and contrast enhancement. He used a multi-classifier with the lung cancer dataset used for training which was taken from the UCI machine learning database. The precision was 97% for cancer identification and 87% for cancer prediction.

3 Problem Statement

The development of a Neural Network-based Detection Model, capable of identifying tumors of Lung Cancer in CT Scan Images of an individual and thereby categorizing them as benign, malignant, or none, while being completely automated in its functioning and requiring minimal human intervention from the start to finish.

4 Existing System

Currently, there exist the following types of testing methodologies, which are deployed at scale to detect LUNG CANCER:

  1. 1.

    Computed tomography (CT) scan—It is not like x-rays; it takes several pictures of the body and the computerized system combines those pictures as a part of the body that will be studied/analyzed. It is used to identify nodes/lumps present in the body

  2. 2.

    Magnetic resonance imaging (MRI) scan—MRI scans, like CT scans has radio waves in the place of x-rays.,. These scans are used to analyze lung cancer metastasis to the brain or spinal cord. These were used on a large scale at a point in time [10].

  3. 3.

    Sputum cytology—In the lab the mucus you cough up from your lungs is tested to find if it contains cancerous cells. The regular approach to do this is by collecting the samples three times each day for three consecutive days. Squamous cell lung tumors are more likely to be found by this because they start in the lung’s primary airways.

  4. 4.

    Needle biopsy—A needle is regularly used to collect a sample from a mass. The disadvantage is that they only collect a little amount of tissue, which in a few situations may not be sufficient to do the diagnosis and for additional testing on cancer cells to assist doctors in choosing anticancer medications.

The challenges faced by the traditional methods are:

  1. 1.

    Long turnaround times—These results of the model can be obtained on the same day or in 1–2 days in a few situations, although the standard method gives us the result taking time such as 1–2 weeks or so.

  2. 2.

    False-negative results—Because there are no exact criteria for negative screening LDCT findings, sensitivity is usually ascertained by classifying new instances of lung cancer and giving the result as false negatives that appear under a year of a screening study. In the six trials that evaluated this characteristic, the sensitivity of LDCT for the identification of lung cancer extended from 80 to 100%.

  3. 3.

    Radiation Exposures—For the conventional/standard methods, the person must undergo many radiations. These methods have shown a huge increase in radiation exposure. The diagnosis of the chest/lungs for CT accounted for the major ratio of radiation exposure in cases of screening of lungs. Patients have to undergo/recieve on an average of eight mSv over the treatment that has been undergone for almost 2–3 years, this involves both the detection and diagnostic/treatment assessment.

5 Proposed Method

Speedy recognition of lung cancer has flatter censorious, and image processing and deep learning techniques have made it possible. A lung cancer diagnosis has become a lot easier with deep learning. Lung patient Computer Tomography (CT) scan pictures were beneficial in this investigation to locate and classify lung nodules, as well as to demonstrate their malignancy stage. In this research, we are utilizing a deep learning CNN algorithm to detect lung cancer from CT-SCAN pictures, and to train CNN, we’re using the CT-SCAN image dataset. The greater motivation of this research is to see how efficient classification algorithms are at detecting lung cancer early.

Deep neural networks have the benefit of a variable weight-sharing mechanism that improves the algorithm’s performance. As a result, we set out to create a reliable diagnostic LUNG CANCER detection algorithm based on CT scan pictures. Our model seeks to use CNN for feature extraction, followed by Deep Learning InceptionV3 Algorithms to accurately characterize a CT scan image as belonging to an infected or healthy person. Deep learning is capable of learning a partial deep model on a partition of the overall data [11]. Data flow graphs allow us to understand more easily about a model. These are provided by a popular library called Tensorflow which is quite an intelligent toolset. It is capable of allowing programmers to construct wide-ranging neural networks with several layers.

To preprocess the photos, scale them to a given width, and do data augmentation, use Tensorflow’s Keras preprocess function. The different information collectors utilize their own schemata for recording the data and the characteristics of different applications result in various data representations [12]. The Softmax function is utilized in the output layer of neural network models that foresee a multinomial probability distribution as an activation function. Softmax is acquired as the activation function for multi-class categorization that requires class membership on more than two outputs depicted in Fig. 1.

Fig. 1
Three illustrations indicate the following. Normal duct labeled lumen. Benign tumor and malignant tumor labeled basal lamina. The discharge from the malignant tumor is also illustrated.

Normal versus Benign versus Malignant Tumor

  1. 1.

    Ability to examine the deportment and efficiency of diverse visual models, from inception to Neural Architecture Search (NAS) networks, and then fine-tune them appropriately.

  2. 2.

    We will determine the model’s classification ability by measuring the area lower than the graph (AUC) of a recipient operator curve. Visually analyze the behavior of these models by representing class activation maps (CAMs) or heatmaps for all the individual networks (ROC).

  3. 3.

    Ability to correctly diagnose lung cancer cases with an accuracy ≥ 90%.

  4. 4.

    Implementation of a fully automated lung cancer diagnosis system.

The following are the step-by-step procedure to build the model:

  1. 1.

    Data Collection: We have downloaded the dataset from a website named Kaggle.of CT Scan images.

  2. 2.

    Data Preprocessing: Resizing, flipping, zooming, and rotating are the few preprocessing techniques used.

  3. 3.

    Building the model: InceptionV3 is the model we're developing to detect lung cancer. In this case, every layer's output is utilized as an input to the next layers. We have used three techniques of regularization, namely L1, L2, and dropouts for building the best fit model.

  4. 4.

    Evaluating the model: We fine-tuned the model by changing multiple hyperparameters such as the count of neurons, activation function, optimizer, learning rate, size of the batch, and epochs to get better accuracy and loss curves. The model workflow is represented in Fig. 2.

    Fig. 2
    A flow chart begins with dataset loading, followed by splitting data, data preprocessing, model building, training the model, and evaluating if it is good or poor. If good, go for model application. If poor, fine-tune.

    Model workflow

6 Experimental Results

6.1 Dataset Description

A total of 1093 CT scans were included in the data set, of which 120 belonged to benign cases, 557 belonged to malignant cases, and 416 to normal cases. The current Model is trained in a way that uses 764 images as training data, 164 images as validation data, and 165 images as testing data. Uniformity between all cases is maintained in all three datasets. All these images were gathered from Open source and are given due credit. The CT scans of selected benign and malignant cases and their counterparts with probable illness are shown in Fig. 3.

Fig. 3
Two C T scan images of the chest in the coronal views present benign and malignant cases. The counterparts with probable illnesses are displayed.

CT scans of benign and malign cases

There is a few parameters and hyper parameters that need to be considered to build the best fit model. In the process, we have experimented with different optimizers, learning rates (0.1, 0.01, 0.2), and activation functions (relu, sigmoid, softmax, swish). The model has been trained on different optimizers to capture the suitable optimizer for our model. The table represents the accuracies obtained with different optimizers shown in Fig. 4.

Fig. 4
A table records the optimizer, training accuracy, and validation accuracy in three columns. The row-wise entries are as follows. 1. Adam, 86%, 74.63%. 2. R M S prop, 77%, 71%. 3. S G D 0.001 L R, 97%, 95.21%.

Optimizer function variations

So, the final model was built using an SGD optimizer with 0.001 learning rate, relu, and softmax activation functions. The model was run for 50 epochs over training and validation datasets resulting in the following accuracy and loss curves are represented in Fig. 5.

Fig. 5
Two graphs of loss and accuracy. a. The training loss and validation loss trends descend from (0, 1200) and (0, 900) to (50, 0). b. The training accuracy and validation accuracy trends ascend with fluctuations.

Loss and accuracy curves

Figure 6 shows the classification result. Hence, the overall accuracy is 95.75, 97% for training, and 95.12% for the testing set. We have built a webpage using Flask which is a lightweight Python web framework. The main aim of building it is that everyone can use it without any difficulty and predict cancer easily and quickly.

Fig. 6
A table has 6 rows and 4 columns. The column headers are precision, recall, F 1 score, and support. The row headers are benign, malignant, normal, accuracy, macro average, and weighted average. The total accuracy is 0.9575757.

Classification report

This works just by uploading an individual’s CT scan image of anyone to predict whether that person has a tumor or not. The three expected results are benign (non-cancerous tumor), malignant (cancerous tumor), and normal (no tumor). The output is obtained in seconds which helps the doctors or the individual who is using. This is very user-friendly, and everyone can easily understand it.

7 Conclusion and Future Work

We used InceptionV3 architecture for accurate image recognition and obtained an accuracy of 95%. Also, a model is implemented which can be accessed by anyone to easily predict lung cancer. In conclusion, our work is at a stage of minute improvements but a stabilized and working state with maximum accuracy. There are a few improvements like increasing the accuracy of the model by using ensembling algorithms, creating an application for lung cancer detection, and showing the size of the tumor and degree (stage) of cancer.