Keywords

1 Introduction

One of the most lethal cancers amongst all the cancer is lung cancer. Considering the statistics throughout the world, it nearly affects up to 12.9% of the overall population [1]. The investigation of lung cancer is predominantly conducted using medical imaging techniques like Chest radiographs (also popularly known as X-rays) or Computed Tomography (CT) scanning. Lung cancer screening is a time-consuming skill that requires expert radiologists. Not to mention, it also consists of intra-observer and inter-observer variability among the decisions of different radiologists in identifying nodules or tumors, which makes the task even more tedious [2, 3]. Lung cancer is one of the highest mortality and morbidity rate, with no visible tumor presence symptoms until the patient has reached advanced stages. Therefore, early detection of cancer is one of the critical problems to be addressed. Computer-Aided Diagnosis (CAD) systems are developed to provide a second opinion to radiologists to overcome these issues. Automating this process will reduce the hassle for the doctors and radiologists and the quick and accurate diagnosis.

CAD systems work in two phases, and the initial phase involves the detection of pulmonary nodules in the CT scans. This is time-consuming with a high mistake rate since other anatomical structures are morphologically similar, leading to misunderstanding as a nodule. Therefore, this task requires careful examination and experienced radiologists to assign a candidate in the CT scan as a nodule. This task is named lung nodule detection. The final phase involves classifying the detected nodules into non-cancerous and cancerous. Not all nodules present in the thoracic region are cancerous nodules. The classification of the nodules is mainly based on the nodule size and malignancy type. Specific guidelines are provided by the Lung CT Reporting and Data Systems (Lung-RADS) that need to be followed in the follow-up strategy to categorize the lung nodules [3].

Related Works. In recent literature, Deep Convolutional Neural Networks (DCNN) are used to detect, segmentation, and classify lung nodules. Shen et al. [4] proposed a deep learning model based on crop operation, avoiding the typical segmentation of nodules. The multi-crop CNN model (MC-CNN) is used to identify the malignancy rate in the lung nodules. The model resulted in robust performance even after the exclusion of the segmentation step for lung malignancy analysis. The literature mentioned above used two-dimensional CT scans to perform any lung cancer diagnosis task. However, one crucial aspect overseen in two-dimensional CT scans is volumetric information extracted from three-dimensional CT scans. Few CAD systems have used three-dimensional CT scans to extract different attributes such as nodule shape, texture, type, etc., using volumetric information. This improved the CAD system’s performance and resulted in a more accurate diagnosis [3, 5, 6].

Our Contribution. In this work, a novel segmentation method named Elagha initialization based Fuzzy C-Means clustering (EFCM) is proposed to perform segmentation of nodule regions from a given CT scan. Followed by nodule detection, a hybrid blend of features extracted from segmented images is used to classify lung nodules. The deep architectures are being used for various medical imaging tasks such as detecting nodules, segmenting the nodule regions, and classifying them into cancerous and non-cancerous. In our work, deep architecture is used as a feature extraction technique. The features from different intermediate layers from a deep CNN are validated for the lung nodule classification’s best performance. Another set of features is also used to achieve better discriminative information in the nodule structure, Bag-of-Visual-Words (BoVW) features. These features build a visual dictionary for various input data, and using those dictionary values, the features of unseen test images are extracted. The combination of BoVW and CNN features outperformed the results of the individual feature representations and state-of-the-art nodule classification systems. The results achieved are verified by an expert pulmonologist.

2 Materials and Methods

2.1 Materials

The dataset utilized for the evaluation of the proposed method is Lung Nodule Analysis (LUNA-2016). This dataset is a curated version of the publicly available lung cancer CT dataset LIDC-IDRI. The number of CT scans in the dataset is 888, which includes 5,51,065 nodule candidates. There are a total of 1186 positive nodules in the dataset [7]. The LUNA16 dataset does not contain the malignancy rate of the nodules. However, the dataset includes the scans taken from the LIDC-IDRI dataset [8]. Each radiologist’s malignancy score is provided in a Comma Separated Value (CSV) file for all the nodule scans. Based on all four radiologists’ average malignancy score, the nodule is assigned as a benign nodule or a malignant nodule.

2.2 Methods

Architecture. Figure 1 demonstrates the architecture of the proposed method. The CT scans consist of candidates that need to be identified as nodules and non-nodules, which is performed using the EFCM method. Once the nodules are categorized, it needs to be further classified into benign and malignant nodules. The nodule can be assigned a malignant label based on the malignancy score provided by expert radiologists. The average score of all the radiologists is calculated, and the nodule is assigned with the corresponding label. Once we get the two classes, the classification is performed using two sets of feature representations, BoVW, deep features, and the combination of both features. The classifier used is the SVM for the final decision.

Image Enhancement. Enhancing the images draws more attention towards certain characteristics of an image, making the images more precise, sharp, and detailed. This, in turn, can be used for better analysis and information extraction from the images. In this work, the Histogram Equalization (HE) technique is used where the contrast is altered by adjusting the intensity of the image, which provides an enhanced CT scan image. The comparative frequency of occurrence of different gray levels in the image is represented in the histogram [9].

Fig. 1.
figure 1

Schematic architecture of the proposed method

The histogram \(h(r_k)\) of an image consists of a L total intensity values \(r_k\) within a range of [0–255] (refer Eq. 1).

$$\begin{aligned} h(r_k) = n_k \end{aligned}$$
(1)

where, \(n_k\) is the number of pixels with an intensity value of \(r_k\) in the image.

The histogram can be obtained by plotting the \(p(r_k)\) which is shown in below Eq. 2:

$$\begin{aligned} p(r_k) = \frac{h(r_k)}{number\, of\, rows(M) * number\, of\, columns(N)} = \frac{n_k}{MN}; k = 0, 1, 2, ... , (L-1) \end{aligned}$$
(2)

The HE of an image is a transformation function i.e., Cumulative Distribution Function (CDF) is given in below Eqs. 3 and 4:

$$\begin{aligned} cdf(k) = \sum _{i=0}^{k} P_r(r_i), i= 0, 1, ..., L-1 \end{aligned}$$
(3)
$$\begin{aligned} s(k) = T(r_k) = \left\lfloor (L-1)\sum _{i=0}^{k}p_i \right\rfloor = \left\lfloor \frac{(L-1)}{MN}\sum _{i=0}^{k}n_i \right\rfloor ; k=0,1, ..., (L-1) \end{aligned}$$
(4)

Image Segmentation. The proposed work uses a novel clustering approach named Elagha initialization based Fuzzy C-Means clustering (EFCM) to segment the nodule region from the given input CT scan. At first, FCM partitions the image into several clusters, and then the cluster centroids are selected randomly to compute the Euclidean distance. This random selection of initial centroids may lead to the local optimum solution. Thus, to overcome this drawback, Elagha initialization is used for the initialization of centroids. It generates the initial centroids based on the overall shape of the data. This modification in traditional FCM is termed EFCM.

EFCM method divides the input image (X) into M clusters such that \(x_j = x_1, x_2, ..., x_m\). Then, Elagha initialization calculates the initial cluster centroids by identifying the boundaries of data points and divides them into F rows and F columns to calculate the initial centroids. The width \(w_j\) and height \(h_j\) of the grid cell is computed as shown in Eqs. 5 and 6:

$$\begin{aligned} w_j = \frac{w_{j,max} - w_{j,min}}{F} \end{aligned}$$
(5)
$$\begin{aligned} h_j = \frac{h_{j,max} - h_{j,min}}{F} \end{aligned}$$
(6)

where, \(w_{j,max}\) and \(w_{j,min}\) represents the maximum and minimum widths, \(h_{j,max}\) and \(h_{j,min}\) signifies the maximum and minimum heights respectively. The N number of initial cluster centroids (\(c_i\)) is given by Eq. 7,

$$\begin{aligned} c_i = \frac{w_j}{2} + \frac{h_j}{2}, i = 1, 2, ..., N \end{aligned}$$
(7)

After initialization of centroids, the membership function calculation of each pixel is done using Eq. 8:

$$\begin{aligned} \mu _{ij} = \frac{1}{\sum _{i=1}^{N}\sum _{j=1}^{M}\left( \frac{1}{d_{ij}} \right) ^{\frac{2}{q-1}}} \end{aligned}$$
(8)

where, q indicates the power exponent, \(d_{ij}\) is the Euclidean distance between samples \(x_j\) and cluster centroid \(c_i\) and is given by, Eq. 9:

$$\begin{aligned} d_{ij} = \sqrt{\sum _{i=1}^{N}\sum _{j=1}^{M}(x_j-c_i)^2} \end{aligned}$$
(9)

The objective function \(\xi \) used for the initialization of FCM algorithm is given by Eq. 10:

$$\begin{aligned} \xi = \sum _{i=1}^{N}\sum _{j=1}^{M} \mu _{ij}d_{ij}^2 \end{aligned}$$
(10)

The clusters are formed for nodule and non-nodule regions into separate groups based on the Euclidean distance. The output of the EFCM algorithm is a segmented image consisting of lung nodule regions.

Bag-of-Visual Words (BoVW). Learning feature representations from images using the BoVW method is a two-tiered process. The information from the segmented images is extracted from a pre-generated codebook or dictionary consisting of low-level local features, also known as visual words. The image descriptors used in this work are SIFT features. A visual dictionary is represented using a histogram named “Bag of Visual Words (BoVW),” which is used as a mid-level feature representation [10]. The words in the image mean information in a patch of an image. The patch size must be larger than a few pixels to retrieve more and better information, as it should consider key parts like corners or edges.

SIFT operation is based on the local edge histogram technique. The SIFT technique is one of the popular methods that work very effectively for the BoVW method. Densely sampled SIFT features are extracted from the images. K-means algorithm is used to get cluster centers to generate a visual codebook or dictionary on these features. A histogram is built to the nearest code in the codebook based on the number of occurrences of a feature in each image. The image is then divided into sub-regions of size \(2 \times 2\), and histograms are built for each sub-region. Once all the histograms are generated, all the sub-region histograms are concatenated to form a single feature vector.

Deep Features. In medical imaging, deep architectures are mostly used for final decision-making. However, in this work, deep architecture is used as a feature representation. The deep learning models are well-known for learning hierarchical information from the input images. The higher the layers, the more information the network learns. This novel set of features are used for classifying cancerous and non-cancerous nodules. Images of both categories are trained separately using a deep CNN architecture, and intermediate features of both the classes are extracted. The deep features learn better representations as the network gets deeper.

Nodule Classification. Classification of detected nodules into cancerous and non-cancerous is performed using an SVM classifier. The model is trained using BoVW, deep features, and the combination of these features. The kernel used for SVM is linear. The hyperparameters set for the SVM model are cross-validation parameter set to 5 and the cost parameter set to 0. The model is tested using probability estimates generated from the trained model for the classification.

3 Results and Discussion

3.1 Nodule Segmentation

A novel segmentation EFCM approach is proposed in this study to get the region of interest from the input CT image, which is the nodule region. The results obtained from the proposed method are illustrated in Fig. 2. The figure depicts five input CT scans, along with their ground truths and predicted masks. The visual representation of the results clearly shows that the proposed model significantly predicts the mask of the nodule. This yields an exact nodule region extraction from the input CT scan eliminating the additional background noise in the image. This segmentation of the nodule helps in getting better nodule information.

Fig. 2.
figure 2

Segmentation results achieved using EFCM model

Figure 3 shows the dominance of the proposed EFCM segmentation method. The proposed model obtains the Dice Score Coefficient (DSC) of 97.10%, whereas existing methods obtain lower values, such as U-Net of 80.36%, V-Net (92.86%), Fully Connected Network (FCN) U-Net (91.20%), and Mask Region-based CNN (Mask RCNN) (71.16%). Also, the Intersection-over-Union (IoU) of the proposed technique is 91.96%, but the existing methods show lesser values. Likewise, the Sensitivity (SEN) value of 95.35% makes the proposed model preferable to the current techniques. On the other hand, the proposed system attained the Positive Predictive Value (PPV) of 96.30%, which is higher than the existing methods. This higher DSC, IoU, SEN, and PPV show the improved performance of the proposed EFCM segmentation model.

Fig. 3.
figure 3

Comparison of proposed EFCM model with existing segmentation techniques

The evaluation of the nodule detection system for the LUNA16 dataset is performed using a primary performance metric named False Positives per scan (FPs/scan). Figure 4 illustrates the Free-Response Receiver Operating Characteristic (FROC) curve achieved for the proposed EFCM method. The graph depicts that the proposed method resulted in low FPs/scans, proving it is a better performing system. The FPs/scan result for the proposed EFCM model is 2.7 FPs/Scan with a sensitivity of 95.35%.

Fig. 4.
figure 4

Free-response receiver operating characteristic (FROC) curve for the performance of proposed EFCM at 2.7 FPs/Scan

The proposed EFCM segmentation method is compared with the existing lung nodule detection systems in Table 1. The methods considered for comparison are mostly deep learning architectures such as U-Net [15], dual branch residual network [17], convolution neural network [21], deep Fully Convolution Networks (FCN) [19], receptive field-regularized (RFR) V-Net [22], and so on. In recent trends, deep learning architectures have taken over image segmentation techniques. However, in the proposed method, a clustering approach for segmentation attained better results in both DSC and IoU.

Table 1. Comparison of the proposed system with the state-of-the-art lung nodule detection systems

3.2 Nodule Classification

The performance of the lung nodule classification system is evaluated on the publicly available LUNA16 dataset. The accuracy obtained for the lung nodule classification task is 96.87%. The performance metrics considered for the evaluation of the proposed method are accuracy, error rate, specificity, sensitivity, FPR, and F-score. The results are presented in Table 2.

Table 2. Performance of the lung nodule classification for the proposed system

A layer-wise feature comparison is performed for deep and BoVW + deep features. Figure 5 illustrates the accuracy values obtained for different intermediate layers for deep features. The number of layers considered is from 1 to 10. It can be noticed that the performance of the system increases with the increase in layers. After 7\(^{th}\) layer, there is a degradation of accuracy observed in the figure. This is due to overfitting the model for more layers as complexity increases, and less data is available to train the deep architecture. Therefore, the number of layers must be monitored to prevent the model from overfitting.

Fig. 5.
figure 5

Layer-wise accuracy values for (a) Deep features, (b) BoVW + Deep features

The proposed method is compared with state-of-the-art lung nodule classification systems. The systems previously proposed for performing lung nodule classification system utilizes deep learning architectures such as Artificial Neural Network (ANN), CNN, multi-scale CNN, Stacked Auto Encoder (SAE), etc. The results are presented in Table 3. It can be noted from the table that the proposed method achieved better performance as compared to the state-of-the-art lung nodule classification systems. Another major issue in training these networks is it is computationally expensive as it requires a lot of time to train a deep model. However, in our method, we used CNN as a feature extractor rather than a classifier. It does not require much time to extract intermediate features. It is also computationally less expensive as the classifier used does not require much time to learn BoVW and deep features. Because BoVW features are encoded, representations do not take up more time for calculation.

Table 3. Comparison of the proposed system with the state-of-the-art lung nodule classification systems.

Figure 6 illustrates the quantitative analysis of the three feature representations using the Receiver-Operating Characteristics (ROC) curve. The ideal system provides an Area Under Curve (AUC) of 1. The AUC values attained for BoVW, deep, and BoVW + Deep features are 0.83, 0.88, and 0.92, respectively. The classification performance analysis is presented for the SVM classifier. It can be noted from the Figure that BoVW + Deep features resulted in the highest AUC. The feature combination worked effectively to improve the performance of the system.

Fig. 6.
figure 6

Receiver-Operating Characteristics (ROC) curves for (a) BoVW, (b) Deep features, (c) BoVW + Deep features

4 Conclusion

Lung cancer is considered to be one of the deadliest diseases. In this work, a novel clustering-based segmentation method named EFCM is proposed to extract lung nodules from the given CT scan. A hybrid of two different types of feature representations for lung nodule classification is proposed in the work. The method glorifies that deep learning can be used as a classifier and as a suitable feature extractor. The segmentation method introduced in this work performs better than existing segmentation methods in terms of DSC, IoU, and PPV. The proposed method acknowledges that a combination of certain feature representations can enhance the system’s performance in terms of various evaluation metrics such as accuracy, sensitivity, etc. It also reduces the computational cost of the system by reducing the system’s learning parameters. The proposed system effectively combines the encoded feature representation method BoVW and deep features extracted from intermediate layers of a CNN. The performances obtained in the medical imaging tasks are rather critical and also require quicker output. The proposed method provided better and faster results than other CAD systems proposed for the lung nodule classification tasks. In future work, other sets of feature representations and also deeper networks with larger datasets can be explored for performing lung nodule classification.