1 Introduction

Lung Cancer has been the leading causes of death worldwide, with 1.8 million deaths predicted in 2018 and incidence of 2.1 million new cancer cases. According to GLOBOCAN 2018 survey, lung cancer persists to be most common cancer (11.6% of entire cancer cases) and has the highest mortality rate of 18.4% (of the entire cancer population) [1]. Lung Cancer is mostly found in men than in women with a ratio of 4.5:1 [2]. It is the leading cause of death among men and women in 93 and 23 countries respectively. In 2018, there were 1,184,947 and 5, 76,060 death cases in men and women respectively due to lung cancer [1]. The Global Disease Burden of Lung Cancer is likely to increase in the first half of century due to increasing trends in lung cancer incidence and mortality and very less improvement in survival globally [3]. It is observed that there are substantial delays from development of symptoms to the first initial treatment in most of the lung cancer patients [4]. Majority of patients are mostly diagnosed at a much later stage of lung cancer and the death mostly occurs in the first year after diagnosis [5].

Lung cancer is uncontrolled growth of cells in lung parenchyma which if not treated at early stage can spread to distant location in the body and cause metastatic complications. Cancer that starts from the lung is termed as primary lung cancer. Lung cancers are classified as either small-cell lung carcinoma (SCLC) or non-small-cell lung carcinoma (NSCLC) histopathologically, the SCLCs constituting 15% and NSCLCs constituting 85% of entire lung cancer population, respectively [6]. NSCLC are further sub-divided into three types: Adenocarcinoma, Squamous-cell carcinoma and Large-cell carcinoma [7]. Among all lung cancers, Adenocarcinoma is the most common lung carcinoma; contributing to 40% of entire lung cancer incidents. 25–30% of entire lung cancer population suffer from Squamous cell carcinoma [6, 7], whereas Large cell carcinoma comprises 5–10% of total lung cancer [8]. Categorizing type of malignancy based on histopathology and immunohistochemistry is needed for prognostication and deciding the line of treatment which varies accordingly.

One of the important risk factor of lung cancer is Cigarette Smoking [7, 8]. Cigarette smoke consist of tar made up of 3500 carcinogenic substances [9],which are further converted into DNA adducts by enzymes. DNA adducts causes transition in tumor suppressor gene [10]. Although smoking is considered as a prominent risk factor for developing lung cancer, 10–25% of total lung cancer occurs in never smokers [11]. Some of the other risk factors are air pollution, arsenic substances, radon, asbestos and medical history of chronic bronchitis, pneumonia, tuberculosis, emphysema, asthma [12].

Non-Small Cell Lung Cancer has 4 main stages: (a) Stage IA1, IA2, IA3, IB (b) Stage IIA, IIB (c) Stage IIIA, IIIB, IIIC (d) Stage IVA, IVB which is based on the location and size of primary tumor (T), status of lymph node (N) and distant metastasis (M) [13, 14]. Overall stages of 0, I, II, III, IV can be assigned by combining T, N and M values. Descriptions of each of the stages are mentioned in Table 1.

Table 1 Lung cancer staging

Staging of cancer, which is based on CT and PET-CT scans, is important to prognosticate and to decide the line of management. Hence there is a need to use deep learning strategies to automatically and accurately detect stages in cancer.

1.1 Need for Diagnostic Tool

Unfortunately the symptoms of lung cancer appear late in the course of the disease where it is non-curable. However, there are two major problems: (1) Lack of recognition of the presenting symptoms from lung cancer and (2) Additional time taken to complete diagnostic investigations [4]. The symptoms of lung cancer being common to many other non-serious lung diseases, the diagnosis of lung cancer are usually missed in the early stages. A study in Southern Norway noted that more than two-third of patients was already in advanced stage of lung tumor at the time of diagnosis (stage IIIB and IV) [15]. Delay in diagnostic imaging results in increase in tumor size and stage which negatively impacts the prognosis of lung cancer [16].

Several imaging modalities can be used to diagnose and determine the stage of lung cancer such as Chest X-ray, Computed Tomography (CT) Scan, Positron Emission Tomography (PET) Scan, MRI, CE-CT (Contrast-Enhanced Computed Tomography) etc. Chest X-ray has also been used for screening but did not help much in early detection of tumor. A false negative chest x-ray contributes to substantial delay in diagnosis [17]. Magnetic resonance imaging can identify hepatic metastases which are greater than 6 mm in diameter but may miss small pulmonary metastases [18]. In recent years, low-dose CT scan (LDCT) has been used in screening people who are at a higher risk of developing lung cancer in future. Screening is the test used to find disease in people who have not developed symptoms. LDCT scans can help find abnormal areas in the lungs that may be labeled as cancer. Research has shown that use of LDCT scans in screening high risk people of lung cancer saved more lives when compared to x-ray. In 2011, it was shown by National Lung Screening Trial (NLST) that participants who were screened using LDCT scans had (15–20%) lower risk of dying from lung carcinoma than those screened using chest X-rays [19]. Integrated PET/CT combined the benefits of metabolic and morphological information from PET and CT scan in the staging of Non-small-cell lung cancer [20, 21].

2 Deep Learning

Deep Neural Network (DNN) is a subfield of machine learning and has been widely used in medical image processing domain with promising results and it is expected that it will hold 300 million dollars in medical-imaging market by 2021 [22]. However the two major challenges of deep learning are usability and scalability [23]. The input to neural network is manually extracted features whereas Deep Neural network can extract features on its own from multiple hidden layers. Deep Learning has several advantages: (a) It can directly extract features from the training set. (b) Feature selection process is simple. (c) The three steps of extracting feature, selecting feature and classification can be integrated in the same deep learning architecture [24]. Deep Learning in chest radiology is active area of research in recent years.

Several deep learning networks are being used today namely: Recurrent Neural Network (RNN) [25], Recursive Neural Network (RvNN) [26], Deep Belief Network (DBN) [27, 28], Deep Boltzmann Machine (DBM) [29, 30], Auto encoder [31, 32], Variation Auto Encoder (VAE) [33], Stacked Auto Encoder [34], Deep Residual Network [35, 36], Convolution Neural Network (CNN) [37], Generative Adversarial Network (GAN) [38]. A few categories of deep learning architecture [39] is shown in Fig. 1.

Fig. 1
figure 1

Deep learning architecture

Recurrent Neural Network [25] connects the nodes like a directed graph and remembers the previous input to predict the output. Recursive Neural Network [26] is a special case of Recurrent neural network which applies same set of weight for a structured input. Both Recurrent neural network and Recursive neural network is used for natural language processing. Deep Belief Network [27, 28] is a deep network of multiple connected hidden layers of restricted boltzmann machine, but the units within the layer are not connected. Deep Boltzmann Machine is a unsupervised learning process similar to DBN, but has undirected connection of hidden layers. DBN can be used for speech and image recognition. They also observed that the accuracy of the unsupervised approach used by Deep Convolution Belief Network increases with addition of new unlabeled data [40]. Autoencoder has a encoder and decoder neural network which learns the data through a iterative process. Variation Auto Encoder is a type of autoencoder that regularises the training process. Sparse Autoencoder is an autoencoder which activates only few hidden layer based on the error that occurs while encoding the input. Stacked Autoencoder is a sequence of sparse encoder where the output of one hidden layer is input to next hidden layer. Autoencoders are mostly used in denoising image and extraction of features from image. Convolution neural network has a sequence of convolution layers to extract features from images,followed by fully connected layer. Generative Adversial Network uses generator and discriminator model which in turn push each other to improve their performance. Deep Residual Network is convolution network that skips of some layers. CRBM are composed of Convolution Restricted Boltzmann machine. CNN have proved to provide best results with respect to image classification.

Two main issues with respect to deep neural network are: over fitting and processing time. Over fitting occurs when the network learns pattern present in training set but cannot learn patterns beyond it. Hence the training set has to be huge and maintain a perfect balance to cover all possible patterns that could occur in real time [41].

3 Deep Learning for Lung Nodule Classification

3.1 Preliminaries

Detecting nodule in early stage can improve the chances of survival of the patient. Determining the cancer stages can also help in providing appropriate treatment to the patients. Interpreting the diagnostic images is quite challenging for the radiologist as the dimensionality of CT images are quite huge and nodule could be of smaller size. Some of the features that effect visibility of nodule are density, size, location and contour. Predicting lesion of smaller size (less than 5 mm) is difficult. Even in LDCT scans factors which may lead to lesions being missed are: (1) Location close to structures like the skeletal system, blood vessels, pulmonary hila, diaphragms or airways which can limit visibility. (2) Low Nodule density like ground-glass [42] or sub solid density. (3) Inter-Observer variation [43]. Peri-fissural nodules are solid homogeneous nodule with smooth margin attached to the fissure (typical) or not attached to the fissure (atypical). Both the peri-fissural nodules are non malignant.

And hence computer aided Lung Cancer detection is needed to assist the radiologist to make accurate decisions. CAD output acts as a second opinion to assist radiologist by improving the accuracy of radiological diagnosis and also reduces the image reading time. CAD system are divided into two types: 1) CADe (Computer Aided Detection) System 2) CADx (Computer Aided Diagnostic) [44]. CADe can be used to detect lesion whereas CADx is used to determine malignancy in tumors along with staging [24]. The four main steps in CADe are: segmentation of the lung parenchyma, detection of the candidate nodules, false positive reduction and classification. The most crucial steps in computer aided detection are to reduce the false positive rate in lung nodule detection [45]. The 3 main steps in CADx are: extraction of feature, feature subset selection and classification which have to be integrated together [46].

Machine Learning tools and techniques have been widely used in diagnosis and prognosis of lung cancer. They can detect lung cancer at a very low radiation dose of 0.11 mSv. Neural networks were found to have higher sensitivity of 91.5–95.9% for lung cancer detection at standard dose [47].

The two basic approaches used in classification of lung nodules are: (1) Radiomics approach which is based on extraction of image features from Lung CT scan. (2) Convolutional Neural Networks (CNN). Radiomics approach builds the model by extracting either 2D or 3D image features of lung nodules. It also requires appropriate Lung Segmentation and Feature Extraction algorithm to classify the tumor whereas CNN do not necessarily require segmentation and feature extraction phase. However CNN requires very large dataset.

The five basic steps in Lung Nodule Classification using Radiomics approach are: Image Acquisition, Image Pre-processing, Image Segmentation, Feature Extraction and Lung nodule Classification as shown in Fig. 2. In Image Acquisition, we extract the image from either 64, 256 or 320 slice CT scanner. Image preprocessing stage enhances the image by removing noise. The tumor is then segmented from the region of interest using various segmentation algorithm in the image segmentation stage. Several features such as texture, gradient, shape are then extracted from the segmented image. The classification algorithm then uses these extracted features to classify the tumor into malignant and benign.

Fig. 2
figure 2

Basic steps of classification using radiomics

The basic steps involved in classification of lung nodule, using deep Convolutional Neural Network are shown in Fig. 3. In CNN model the image is passed through a series of convolution layer with filter, pooling layer, fully connected layer and softmax layer. The first convolution layer extracts low level features from the input image where as the subsequent layers extract the semantic features. In convolution layer, a kernel slides over the input to produce output using dot product, which is concatenated with bias and a non linear activation function like ReLU is applied on it. The output of the convolution layer is sent to the pooling layer to reduce the dimensionality of image at the same time retaining the necessary information of image. A sequence of convolution and pooling layer extract high level features from the image. The feature map is then flattened to a one dimensional vector and fed to a fully connected network. A fully connected neural network contains multiple hidden layer with weights and bias. Neural network uses a non-linear activation function as they allow back propagation whereas linear activation function does not support back propagation as the derivative of function is not related to inputs. The performance of the neural network will not improve with increase in hidden layers until we use a non-linear activation function. Finally a activation function such as sigmoid or softmax is applied to classify the object based on the probability from 0 to 1.

Fig. 3
figure 3

Basic steps of classification using CNN

3.2 Data Acquisition

Lung CT scans can be obtained from various sources namely: LIDC/IDRI [48], NLST [19], LUNA16(Lung Nodule Analysis 2016) dataset [49], VIA/I-ELCAP [50], SPIE-AAPM Lung CT Challenge [51], Kaggle Data Science Bowl dataset [52], LISS (Lung CT Imaging Signs) [53], RIDER (Reference Image Database to Evaluate Therapy Response) [54], Lung CT Diagnosis [55]. LIDC/IDRI collects a dataset of 1018 cases with slice thickness varying from 0.45 mm to 5.0 mm and an associated XML file to record two-phase annotations from 4 experienced thoracic radiologists for the nodules > = 3 mm. Malignancy level of each nodule is rated from (1– 5) where 1 means low and 5 means high. Radiologist also rated them on variety of aspects such as internal structure, malignancy, texture, margin, sphericity, subtlety, calcification, lobulation and spiculation. Nodules with average malignancy level less than 3 are considered as non–malignant by few researchers [56]. NLST has collected 75,000 low dose CT screening images which can be utilized for research purposes and mostly consist of baseline scan(T0) followed by 2 successive scans(T1 and T2). LUNA dataset was collected from LIDC/IDRI database with a total of 888 CT scans. VIA/I-ELCAP database consist of 50 low dose CT scans obtained at a single breath hold and 1.25 mm slice thickness, obtained from ELCAP and VIA research groups. SPIE-AAPM-NCI consists of 70 lungs CT scan, most of which are contrast enhanced. Kaggle Data Science Bowl (DSB) dataset consist of 1397 instances of low dose CT scan in DICOM format along with series of axial 2D slices of size 512X512. DSB datasets mostly contains nodules of larger diameter average of 13.68 mm) and mostly located at the main bronchus. LISS database comprises of 271 Lung CT scans which are divided into 9 classes namely: Grand Grass Opacity(GGO), Cavity and Vacuolus(CV),Bronchial Mucus Plugs(BMP), Spiculation(S), Lobulation(L), Obstructive Pnemonia(OP), Calcification(C), Plueral Indentation(PI), Air Brochogram (AB). RIDER Lung CT comprises of 32 lung CT scans of patient having Non small cell lung cancer each of whom underwent two CT scans.

3.3 Image Preprocessing

Image preprocessing is needed to improve the quality of the grayscale CT image by reducing the noise. CT images are mostly effected by Gaussian noise, Poisson noise and artifacts. Several denoising technique like dictionary, filter and transform based approaches can be used to remove noise and artifacts from CT and Low dose CT images. Effective preprocessing technique can facilitate accurate segmentation of lung nodule. F Liao et al. converted the CT lung image into Hounsfield Unit, which is binarized using threshold, selected the connected part of lung, separated the left and right lung with erosion and dilation, computed the convex hull, dilated and combined left and right mask and filled the mask with luminance [57]. Density tendency correction could also be used segmenting lung nodule [58]. CNN architecture like Alexnet mostly needs RGB image of same dimension and hence Haritha et al. and Lei et al. transferred the image from grayscale to RGB and then performed thresholding, segmentation, erosion, dilation to obtain the mask of CT image [59, 60]. However Rahul paul et al. took information from R channel and ignored the B and G channel instead of converting grayscale to RGB image [61]. Allison et al. showed that accuracy of CNN was better on unsmoothed image than on smoothed image. As the lung volumes are too large and are of varying size, feeding the data in deep learning model is challenging and hence a resizing is needed. CT images are preprocessed and reduced to either 128X128 size [62, 63] and 150X150 sizes [60] for 2D images or 20X20X10 for 3 D images [64] based on the maximum size of nodules. Gaussian scale space filter [65, 66] or Gabor filter [67] could also be used to enhance the image. As the nodule size varies, bi-cubic interpolation could also be applied on nodule images [61, 68]. Kazuki et al. pre-processed the CT image by applying isotropic voxel [58] and Gaussian filter to the original images and performed binarization, thresholding, binary inversion, interpolation and thus extracted the lung field [69]. In order to extract 3 dimensional features from CT scan multiple views of CT scans could be combined. Tiantian et al. [56] combined the scans from multiple views using median intensity projection and enhanced using bilinear interpolation. Teramoto et al. [70] extracted the nodule from CT and PET images separately and then combined them using logical OR function where in active contour filter was used to extract the nodule from CT images. Training the model with very few training samples can cause under fitting issues, whereas over learning the training datasets, including the noise can cause over fitting problem. Overfitting problem can be addressed using data augmentation [34, 56, 64, 67, 69, 71,72,73,74,75,76,77], regularization [71, 78], dropout [79], translation [80, 81]. Data augmentation can also be performed by implementing image preprocessing techniques such rotating [77] and flipping the images horizontally and vertically [57, 68] or by using random scaling [82], adding noise [56, 72] so as to increase the number of training samples. Severe class imbalance in the dataset, can cause misclassification of minority class samples, which can also be addressed using data augmentation.

3.4 Image Segmentation

Lung nodule segmentation is a major task in lung nodule classification. We need to segment the lung parenchyma followed by the nodule segmentation in order to inspect the nodule for malignancy. Pulmonary nodule is segmented in training and testing phases. Four different types of pulmonary nodules are solid, semi-solid, non-solid and calcified. The surface of the large solid nodule (> 10 mm) have different intensity range than the smaller lesions which is difficult to capture using solid nodule detection algorithm. The most challenging step would be to separate out the juxta-pleural and juxta-vascular nodules as the contrast of large solid nodule with pleura is low [83]. Candidate nodule can be segmented using local geometric based filter and is found to increase the sensitivity of 3D CNN. Most of the lung nodules are segmented manually based on annotation provided by radiologist [84]. To facilitate the development of lung nodule detection, LUNA dataset also provides a annotation file which provides the x, y, z coordinates of nodule in the CT scan along with its ground truth labels and hence does not require the need for segmentation of lung nodule. Shou wang et al. used both, box and center region-of-interest [85, 86]. Since the shape of the nodule is irregular, we extract a bounding box of the nodule manually [87] or by using deep convolution neural network such as Faster region CNN [7, 44, 82, 87] or fully convolution neural network [57]. Segmentation of the nodule can be automated using different techniques namely border analysis, region based model [67, 85], shape and probabilistic models. Regional method includes regional growth, threshold and clustering method. Threshold method includes iterative, maximal interclass variance method [85], entropy method. Markov gibbs random field was a conventional approach to extract region of interest [88]. Clustering algorithm like k-means clustering [89] and fuzzy c-means clustering algorithm have also shown better accuracy in segmentation of lung nodule [90]. Recently deep learning techniques are also used in lung segmentation. Jeovane H. et al. used fully convolution networks (FCNs) combined with conditional random field for segmenting lung image [80, 91]. Taolin et al. [78] applied binary mask, thresholding, erosion, dilation, closure operation to segment the lung image. Kazuki et al. [42] removed the vessel and bronchial region by using 3D line filter based on Hessian matrix, and extracted the candidate regions using density gradient, volume and sphericity and then segmented the lung using threshold segmentation. They also reduced the false positive shadow of blood vessel using adaptive ring filter. Mitsuaki et al. [58] extracted the nodule from temporal subtraction images using super pixel and graph cut algorithm, retrieved 5 different features of nodule such as standard deviation, sphericity, slenderness, average density, slenderness and max vector degree of concentration and reduced the false positive nodules using SVM. Rahul paul et al. [61] showed that the accuracy of classification is relatively higher when features are extracted from cropped tumor of size 56X56 than the features extracted from warped tumor patches or patches obtained from sliding windows. Combination of features gathered from multiple view of CNN provides better segmentation results even on ground glass opacity nodules as suggested by Wang [92]. Some negative sample appears similar to nodule and hence difficult to classify. This problem can be solved using hard negative mining [57, 69]. Since most of the false positive in nodule detection is also due to airways in lungs, a 3D segmentation algorithm is needed [66].

3.5 Feature Extraction

Two types of features can be extracted: (1) Hand-crafted features (2) Features extracted from convolution neural network. Some of the handcrafted features include texture [61, 64, 67, 68, 90, 93,94,95,96,97,98,99,100,101],wavelet [93, 99], Fourier descriptor [94], shape [32, 64, 67, 70, 95, 98, 100, 101], gradient [95, 97, 101], density [97, 98], Histogram of Oriented Gradients (HoG) features [102], statistical [95, 96, 103], size [32, 64, 68]. Hand-crafted features describe low-level visual features instead of high-level features [85]. A study by Kuruvilla et al. [63] showed that statistical features like kurtosis, mean, standard deviation, central moment increases the accuracy of classification system. Metastasis features such as body temperature, insomnia, bodyweight loss, constipation, breathlessness, heart rate, hypercalcaemia, temperature, fatigue, blood pressure could also be used to predict the stages of lung tumor [67]. Ground glass opacity is quite difficult to detect but is considered to show a spherical shape when viewed in 3 dimensions and shape feature can be extracted to detect a Ground glass opacity tumor. CNN does not require tumor segmentation and feature selection phase.

Nodule detection includes two sub tasks: (1) to detect all positive nodules (2) To reduce the false positive nodule [44, 57, 82, 87]. Several variations in CNN have been utilized. Few of them are usage of Particle Swarm optimization with CNN to optimize the hyper parameters [104], Multi-Crop Convolution Neural Network (MC-CNN) [105], Multi Scale Convolution (MS-CNN) [65, 66, 103, 106,107,108,109], Multi-Level based Deep CNN (ML-CNN) [110,111,112,113], Convolutional Deep Belief network [40], Multi-Pathway CNN architecture (MP-CNN) [114], Multi-Resolution CNN (MR-CNN) [115], Region based CNN (R-CNN) [44, 82, 87, 116], Multi-View CNN (MV-CNN) [72, 79, 80, 103, 117, 118, 119], CNN with dense and shortcut connection [60], Contextual CNN [45, 120], CNN with multi channel region of interest [97], combining output of Multi View CNN using data fusion technique [117]. Region based CNN selects a pre fixed number of region proposals which can further be used for nodule classification [87]. Multi Crop CNN concatenates feature maps obtained by max-pooling the convolution and centre cropped convolution features [121]. Multi scale CNN takes nodule patches from multiple scale or size and combines them in parallel [65]. Multi view CNN concatenates multiple view of lung nodule into single view [72]. Multi level CNN combines multiple CNN with different kernel size together [113]. Several CNN architectures has been proposed for both segmentation and classification. Few of the CNN architecture used are LeNet [100, 122], AlexNet [42, 58, 59, 61, 78, 97, 122,123,124], StochasticNet [75], ZedNet [125], GoogleNet [56, 126], VGGNet [92, 133], ResNet [36, 64, 79, 80, 82, 111, 123, 127,128,129], DenseNet [107, 130,131,132], Tumornet [67, 72], U-Net [44, 57, 69, 80, 114, 120, 126, 133, 134], Dual-path Network [44], Overfeat [135, 136], ReCTnet [137], Xception [138]. Alex-net contains 8 layers, which comprises of 5 convolution layers, 3 intermediate pooling layers and finally 3 fully connected layers [97]. TumorNet comprises 5 convolution layers with max-pooling layer in first, second and fifth convolution layers, 3 fully connected layers and a softmax layer [72]. ResNet consist of multiple small network connected by shortcut connection [128]. U-Net consists of contraction and expansion path mostly used for medical image segmentation [114]. Dense-Net is convolution network where in each layer is connected to every other layer [107]. Dual Path network can gather the benefit of Residual network and dense network by eliminating gradient descent problem and by exploring new features respectively [44]. Few research papers have also integrated hand crafted features with CNN features in order to improve the accuracy [61, 71, 139, 140]. Rahul paul et al. [61] observed that the accuracy of integrated feature classified using naive bayes showed improvement of 82.5% over CNN feature alone classified using random forest (75%). Transfer learning can also be used which uses pre-trained networks to extract features and can reduce the over fitting problem when the dataset are comparatively small [59, 61, 68, 77, 141, 142].

There are 2 types of CNN features: (1) 2D CNN features [61, 73,74,75, 77, 86, 104, 110, 112, 114, 117, 122, 124, 129, 138, 141, 143,144,145,146,147] (2) 3D CNN features [41, 44, 45, 57, 60, 64, 66, 69, 71, 78, 79, 82, 87, 94, 105,106,107,108, 111, 115, 118, 120, 120, 127, 128, 130, 132, 134, 139, 141, 142, 148,149,150,151,153]. 2D CNN mostly considers the CT slice which had the largest area. However 2D CNN could not extract spatial information and correlation between slices. Some researchers also used both 2D and 3D features [107]. Various cross-sectional images can be used in axial, coronal, sagittal views to capture 3D features. 3D data analysis requires higher computation cost and hence 3D data can also be represented in 2D patch using trigonometric sampling approach to lower the computation cost [138]. Some researchers have used CNN for both detection and classification [134, 138]. The discrimination of nodule from tissues such as blood vessels is difficult from axial slices [69]. Lei et al. extracted 3D feature by applying CNN on 20 unified slices.

A detailed summary of Features extracted in few of the research articles are discussed below in Table 2.

Table 2 Comparative study of features extracted in various DL research papers for classification of lung nodule

3.6 Classification Algorithm

Some of the most commonly used classifiers for classifying nodules are Random Forest [61, 65, 68, 135, 139, 156, 160], Artificial Neural Network [16, 100], support-vector machine (SVM) [67, 70, 77, 98, 103, 136, 154], Linear Discriminant Analysis (LDA) [93], Decision Tree [74, 84, 93], Boosting [44, 94, 143], Linear Regression [102], Logistic Regression [64, 140], Random Forest Regression [156], Gaussian Process Regression [72], Naïve Bayes [61, 160], DBScan [106], Multi Kernel Learning [95, 101].

Several other deep architectures also seem to show promising result like Deep Belief Network (DBN) with RBM [40, 93, 99, 155], Deep Residual Network (DRN) [36, 82, 111], Deep Reinforcement Learning (DRL) [105, 162], Multi-layer perceptron model [96], Deep Denoising autoencoder [32, 98, 99, 102, 140], Deep Sparse Auto-encoder [62, 140], Generative Adversarial Network [120], Deep Stacked Autoencoder [34, 88], Convolutional Neural Network [34, 41, 42, 45, 56,57,58,59,60, 63, 64, 66,67,68,69, 71,72,73, 76, 78,79,80,81, 86, 87, 97, 99, 101, 104, 107, 108, 110,111,112, 114, 115, 117, 118, 120,121,122,123,124,126,127,128,129,130,131,132,132,133,134,135,136, 138, 141,142,143,144,145,146,147,148,149,150,151,153, 155, 161]. Ensemble model can also be used which combines several deep learning model to form optimal model. Final output in an ensemble model can be obtained by combining the prediction from multiple ensembles by using voting, averaging probabilities, max, median etc. Allison et al. [63] found that voting system did not improve the accuracy of system while combining the result of 2D CNN model applied on both smoothed and unsmoothed images. Some study also found that scaled logistic output function outperforms a softmax function [64]. A DBN is a feed-forward neural network with multiple hidden layers which is composed of Restricted Boltzmann Machine (RBM) [93]. An auto encoder is a 2 layered network that takes binary input and encodes/decodes the data using linear and non-linear transformation [84]. A Reinforcement Learning models data with incremental feedback rather than labeled data [105]. A Sparse Auto Encoder (SAE) can learn features from unlabeled data and a Deep Sparse Auto Encoder is made up of multiple layers of SAE which is trained layer by layer [62]. A Denoising Auto Encoder (DAE) contaminates the original image by adding noises but trains the model to output image similar to original image without contamination. Stacked Denoising Auto Encoder (SDAE) uses a stack of such denoising autoencoder to form a deep architecture [98]. Convolutional Neural Network (CNN) consists of input, hidden and output layer, where the hidden layer consists of convolution, ReLU, pooling, fully-connected layers.

Although CNN has shown promising results, several variation of CNN is being proposed. Multi path CNN proposed by Sori et al. have shown better results than CNN by concatenating the features from multiple paths of CNN [114]. Tajbakhsh et al. found that massive training artificial neural network shows better accuracy when compared to CNN with less training set [153]. Multi scale CNN as proposed by Wei Shen et al. showed better results with noisy data thus avoiding the data pre-processing stage [58]. Researchers have also used ensemble classifiers which combines several models into one predictive model to improve the performance of classification system [68]. Xie also showed that the concatenation of CNN features with texture and shape provides accurate results to represent heterogeneity of nodule but increases the computation time [83]. Wei Shen also proposed multi crop CNN to predict the nodule semantic attribute by using max-pooling on multiple cropped region of the convolution [93]. Yuan fused the features from multi view, multi scale CNN along with geometrical features to classify different types of nodule based on appearance and showed that the accuracy was better than using multi scale or multi view CNN alone [103]. The prediction of multiple CNN on different preprocessed images were ensemble using voting system by Allison to reduce the false positive rate in classification [63]. Devinder used autoencoder to extract the features which maps the semantic features [84]. Giovanni showed that CNN parameters can also be optimized using optimization algorithm like particle swarm optimization algorithm which increases the sensitivity of classification system [104]. Masood et al. [67] found that deep fully convolution network shows better performance than tumornet. It was also observed that accuracy of Alexnet was better than Resnet [124]. Albert et al. [126] found that the number of false positive increased with the use of U-Net and hence the output of U-Net was combined with Vanilla and GoogleNet to reduce the false positive. Chen et al. found that the sensitivity of the classification system increases with increase in convolution layers whereas sensitivity decreases with increase in kernel size [141]. Xinyu et al. observed that the increase in number of CRBM layers in DCBN (Deep convolution belief network) increases the accuracy of classification [40]. Kim et al. also found that sensitivity can be increased by combining morphological features with features extracted from Stacked Denoised Autoencoder [98]. A comparative study of various deep learning algorithm used for lung nodule classification and its performance is shown in Table 3.

Table 3 Comparative study of classification algorithm used in various DL research papers for classification of lung nodule

4 Conclusion

Currently Deep Learning is used for lung segmentation, feature extraction, nodule detection, false positive reduction and classification. Most of Machine Learning algorithm needs to understand the data to extract relevant features for classification. Figure 4 gives a brief summary of contribution of different deep learning architecture used in the classification of lung nodule.

Fig. 4
figure 4

Contribution of deep learning architecture for lung nodule classification

Feature selection hugely impacts the performance of the system. However feature selection process is automated in deep learning algorithm like CNN. Due to the success of CNN in most of the image recognition application, CNN is mostly being used in medical image processing for detection of nodules. The advantages of using CNN architecture in lung nodule classification are:1) CNN is most commonly used approach in detection of lung nodule and also outperforms the conventional models [155]. 2)Both feature extraction and classification are combined together by convolution, pooling and fully connected. 3)CNN does not require a segmentation layer. 4) Feature selection is also not needed in CNN as the features are selected by the convolution layer. 5) Convolutional Neural Network has proved to provide better results with respect to lung nodule classification and hence most of the research is focused on improvising CNN. 6)CNN also provides better performance when compared to DNN and Stacked AutoEncoders [34]. 7) A CNN feature also proves to be better than other hand crafted features like shape, size and texture [136]. Some of the challenges faced in using CNN are: 1)CNN works well on huge datasets but gathering real life dataset is quite challenging because of the time it takes. 2)Overfitting problem can also arise if there is imbalance in the dataset. 3)CNN requires several hyper parameters to be set which requires lot of trial and expertise. 4)CNN also requires very large computation and huge processing time and hence require high end GPU’s.

Although size is an important factor for determining malignancy [64], size alone cannot help in determining the malignancy levels [154]. An ensemble model which combines the quantitative features of CT image with CNN seems to improve the accuracy of model [68]. Combination of high level features such as lobulation, margin, calcification, spiculation, sphericity and CNN features improves the classification accuracy [72, 136, 160]. The relationship between adjacent consecutive frames cannot be analyzed with 2D CNN. However 3D CNN can extract spatio-temporal data, but the computation cost would be higher. 3D multi-view CNN was found to provide better accuracy than 2D Multiview CNN and also does not require a very deep network [79, 132]. Most of the experiments also show that 3D CNN shows better sensitivity than 2D CNN [60]. As 3D-CNN is expensive, several variations in 3D-CNN have been proposed to reduce the number of computation. Figure 5 gives a brief summary of number of researcher using 3D CNN and 2D CNN.

Fig. 5
figure 5

Contribution of deep learning architecture for lung nodule classification

Multi-view CNN has lower error rate than single-view CNN [79]. Multi scale CNN can help detect pleural (between chest wall and lung) nodule [76]. Multi scale CNN works better even on noisy input [79]. Figure 6 gives a brief summary of number of researcher using different variation in CNN.

Fig. 6
figure 6

Contribution of CNN variation used in lung nodule classification

Several CNN Architectures are being used for classification, but Res-Net, Alex-Net was most commonly used. Different CNN Architecture could also be ensemble in order to improve the sensitivity the classification system. Combination of CNN with RNN or LSTM have shown better results than using each architecture individually. CT prediction can be combined with clinical probabilities to improve the prediction. However there is no previous studies combining clinical data with CT data. Figure 7 gives a brief summary of contribution of different deep CNN architecture with respect to lung nodule classification.

Fig. 7
figure 7

Contribution of CNN architecture for lung nodule classification

CNN works better with large dataset. Transfer learning is also being used by several researchers [56, 59, 61, 68, 160] which reduces over fitting problem. Use of 2D CNN instead of 3D CNN can increase the volume of lung CT scan for training. Integrating medical images from multiple sources also tend to improve the accuracy of the system [108]. CNN parameters such as kernel size, batch size, learning rate, weight initialization play an important role in effecting the performance of the CNN network [122]. Convolution layer and kernel size affect the sensitivity of the CNN model [141]. Setting up the hyper-parameters for deep neural network is quite a challenging task. Some study also show that performance can be improved by increasing the number of epoch [59, 63]. Classification error can also be reduced by augmenting the data if the dataset is small [73]. Combining convolution network of different configuration gives better sensitivity [125].

Although there are great progress made in the field of lung cancer classification, some of the areas still needs to be explored: 1) Most of the research papers have focused on binary classification of lung tumors as benign and malignant. Malignancy can also be determined using type of nodules which is based on the position and its intensity. Some of the nodule types are well circumscribed (W), non-nodule (N), juxta pleural (J), ground glass opacity (G), pleural tail (P), vascularized (V). However very few of the research papers have focused on ternary classification where in the tumors are classified into benign, primary malignant and metastasis malignant as shown by Kang [75], solid, semi-solid, ground glass opacity as shown by Wei [86, 156], nodule type like W, N, J, G, P, V as shown by Liu [76]. 2) Very few papers have focused on detection of nodules below 3 mm diameter. However Patrice [148] classified micro-nodules from non-nodules. Most of the early stage malignant tumors are smaller in size and if these tumors are detected at early stage,it might increase the life expectancy of the individual. 3) Detection of ground glass optical and non-nodules are difficult and are explored by very few researchers like [76]. 4) Segmenting out large solid nodules attached to the pleural wall is quite challenging and needs to be explored. 5)Some researchers have used private dataset which becomes very difficult to compare the results with different algorithm and hence there is a need of global dataset which can be used by all researchers. 6)One of the major challenges in lung cancer detection is to differentiate between early stage small cancerous lesion and benign nodule which can be confirmed by both invasive and non-invasive method. The invasive method includes use of biopsy whereas non-invasive method requires follow-up of CT scans periodically. The invasive method can cause complications like bleeding and infection of wounds whereas non-invasive method can increase the risk of radiation. Hence there is a need for advanced deep learning technique to identify the morphological difference between early stage cancerous nodule and benign noduleto overcome the drawback of invasive and non-invasive cancer detection technique.7) Additional information of patients such as the medical history, genetic report can also be analyzed and fused with the deep features extracted from lung scan images to improve the efficiency of automated tumor detection. And hence a strong collaboration between physician and researchers needed to design a full proof model. 8)Due to privacy issues and to prevent the use of sensitive information, CT datasets cannot be easily shared for research purpose. 9)Majority of the researchers have limited their work to lung nodule detection; there is scope for utilization of deep learning techniques for automatic staging of cancer. Although many researchers have used various deep learning models with the aim to improve the accuracy of lung nodule classification, there is still scope of improvement in addressing the above challenges with the aim to detect the malignant tumors at an early stage.