Introduction

Lung cancer is the second most common cancer and the leading cause of cancer deaths in both men and women in the USA, claiming hundreds of thousands of lives each year. In 2013 (the most recent year numbers available), 111,907 men and 100,677 women were diagnosed with lung cancer; 85,658 men and 70,518 women died due to lung cancer [1]. Globally, lung cancer remains the most common malignancy with an estimated 1.8 million newly diagnosed cases in 2012 and 1.6 million deaths occurring that same year [1].

Fig. 1
figure 1

Block diagram of overall 4 steps of a lung CAD system

There are two main types of lung cancer, non-small cell lung cancer (NSCLC) which accounts for 80–85% of all lung cancers and small cell lung cancer (SCLC) [2]. Most NSCLC lung cancers are detected after wide spreading and advanced stages (i.e., stages III-IV). The highest recorded 5-year patient survival rates, at stage IIIA NSCLC, of 14% are observed in the USA, while the 5-year survival rate is 8% in Europe. The 5-year survival rate for people with stage IA NSCLC in the USA is about 49%. At this stage, the nodule is no larger than 3 cm across and has not invaded into the lymph nodes or distant sites [2]. Ideal detection and diagnosis of lung cancer are at stage 0, where the cancer is no larger than 2 cm across and has not invaded deeper into other lung tissues. The survival of lung cancer is strongly dependent on diagnosis [3]. Pulmonary nodules incidental detection has increased with the use of CT [4]. As such, early screening, detection and diagnosis of lung nodules using computed tomography (CT) could assist in lowering the mortality rates of this very serious cancer.

Computer-assisted diagnosis (CAD) systems have the potential to improve the accuracy and consistency of malignancy classification when used as a secondary reference (e.g., second opinion) to the human expert using the same source of data; i.e., the visible information in an image [5]. The uncertainties that may affect the performance of CAD systems are similar to those of the human expert. Human versus machine vision may be judged fairly if objects have unique features.

A typical lung CAD system is a four major step system that consists of: (a) CT acquisition and enhancement (e.g., scan filtering); (b) lung tissue segmentation; (c) candidate nodule detection; and (d) nodule classification. In our constructed front-end image analysis CAD system for lung nodule screening, these steps in addition to an active appearance nodule modeling stage before detection and a nodule segmentation step are also implemented [2]. Figure 1 illustrates the overall CAD block diagram.

There is a rich literature under the various steps (e.g., [6,7,8,9,10,11,12]). Object description remains a major subject of investigation by researchers in the computer vision literature as objects of interest are described in terms of its shape and appearance [6, 11, 13,14,15,16].

The focus of this paper step (d), lung nodule classification, specifically classifies detected candidates into one of three categories: benign, malignant or non-nodule. We refer to the following studies as a short summary to some of the lung classification literature. Orozco et al. [14] used 11 characteristics calculated from the wavelet transform and support vector machines (SVMs) as classifier. Results obtained for 23 malignant nodules and 22 non-nodules tested reported an AUC of 0.805. Ginneken et al. [17] used local texture analysis for identifying and classifying lung abnormalities such as tuberculosis. The k-nearest neighbor (kNN) classifier was implemented in a leave-one-out cross-validation approach. Two datasets were used in the experimental results. The first a sample of 147 images with textural abnormalities and 241 normal images were selected. Sensitivity of 0.86 and area under the receiver operating characteristics (AUC-ROC) of 0.820 were reported. The second dataset consisted of 100 with abnormalities and 100 normal; sensitivity of 0.90 and AUC-ROC of 0.986 were reported. In [16], the gray-level co-occurrence matrix was used to extract texture attributes and margin sharpness features were used to characterize pulmonary nodules. Classification of 274 benign/malignant nodules of size \(3\le n\le 10\) mm from the Lung Image Data Consortium database (LIDC) was conducted. The kNN, multilayer perception and random forest classifiers were examined. The highest AUC of 0.820 was obtained using a fusion of the texture and margin sharpness features with the multilayer perception. Firmino et al. [18] segmented 1109 nodules of size \(3\le n\le 30\) mm from the LIDC database using region growing and watershed transform. Of these nodules, an additional 379 were randomly chosen that had likelihood of malignancy. The true nodules are detected by the rule-based classifier and the likely malignant nodules by SVM. The classification stage results were 93.9% sensitivity with 7.21 FP.

In this article, two texture features, Gabor and multi-resolution local binary pattern (LBP) and a shape–texture fusion feature descriptor using signed distance transform and LBP, are extracted. These methods are implemented independently and in the case of the feature fusion descriptor serially, i.e., the obtained distance transform images results for the nodules and non-nodules data undergoes LBP texture extraction to produce the shape–texture fused feature descriptions. Since the focus of this manuscript is early classification, i.e., within stage 0, nodules of sizes between 3 and 10 mm are considered. SVM and kNN classifiers are used to assign class label: benign, malignant or non-nodule, to these samples of sizes between 3 and 10 mm extracted from the LIDC.

Materials and methods

The overall schema of this paper is illustrated in Fig. 2. The medical imaging repository database, LIDC, provided the lung nodules and non-nodules from which feature extraction and classification is conducted upon. In this section, each of the steps in Fig. 2 will be described in terms of the analysis conducted and reported in this article.

Fig. 2
figure 2

Overall schematic used in this paper

Nodules selection

The LIDC [20] consists of computed tomography scans with identified and classified nodules by four radiologists. The identified nodules are ranked according to numerous characteristics, such as calcification, size, sphericity and likelihood of malignancy.

In this work, the focus has been on nodule sizes between 3 and 10 mm. The nodule database identifies likelihood of malignancy by rank from 1 to 5 where [20, 21]:

  • Malignancy 1: Highly unlikely for cancer;

  • Malignancy 2: Moderately unlikely for cancer;

  • Malignancy 3: Intermediate likelihood;

  • Malignancy 4: Moderately suspicious for cancer;

  • Malignancy 5: Highly suspicious for cancer.

A database of lung nodules was created by extracting a region of interest (ROI) around the nodule regions. In [8, 19], it was shown that the radial distance distribution, calculated by summing the intensity values on concentric circles of various radii centered at the nodules centroid, has an exponential decay pattern. The radial distance distribution provided an empirical measure for the spatial support of the nodules off the centroid, in the form of a bounding box region of size \(41\times 41\) around the centroid. Nodule regions of this size were cropped from the original CT scans.

From the LIDC study, a total of 1191 samples, each of size \(41\times 41\) pixels, were separated into one of three categories; benign, malignant or non-nodule, with data distribution: 723 benign and 223 malignant nodules between 3 and 10 mm, and 245 non-nodules. Nodules were identified as benign, if identified as malignancy categories 1 or 2 and confirmed by at least two radiologists. On the other hand, malignant nodules were identified as malignancy 4 or 5, with the same condition of radiologists’ confirmation (i.e., at least two confirmed). Nodules identified as malignancy level 3 were not considered in either set of experiments. Non-nodules which consisted of lung parenchyma, tissue and other anatomical structures were also extracted using the same bounding box size.

Features extraction

Invariance and distinction are the main conditions that the success of object description centers around. Distinctive characterization of the desired object needs to be produced while robustly accommodating for variations in imaging conditions. In this section, the multi-resolution local binary pattern, signed distance transform and Gabor wavelets are briefly described.

Multi-resolution local binary pattern (LBP) The LBP is a power texture feature descriptor that is invariant to monotonic changes in gray scale and is illumination resistant, as long as the absolute gray-level value differences are not badly affected [22]. The original operator labeled the pixels of an image by thresholding a \(3\times 3\) neighborhood of each pixel, replacing it by a binary number. The LBP operator was also extended to a circular neighborhood of different radius sizes to overcome the limitation of the small original \(3\times 3\) neighborhood size failing to capture large-scale structures [22]. Each instance is denoted as (PR), where P refers to the equally spaced pixels on a circle of radius R. The parameter P controls the quantization of the angular space and R determines the spatial resolution of the operator.

In this paper, we use the extended LBP operator within a (PR) neighborhood with only uniform patterns, denoted by \(LBP_{PR}^{u2}\). The LBP operator was applied to the original and gradient images, where the Sobel operator was used to generate the gradient magnitude image (Fig. 3). The parameters (PR) utilized for the original and gradient image LBP extractions were (8, 1) and (16, 2), respectively, as illustrated in Fig. 3.

Fig. 3
figure 3

Block diagram of generating the LBP for an example nodule

Signed distance transform The distance transform is a shape-based feature descriptor that represents each pixel of the binary edge map image with a distance to the nearest obstacle pixel, i.e., a binary pixel. The LBP of the signed distance image results is obtained in the same structuring as shown in Fig. 3, thus resulting in a combinational shape and texture feature descriptor representation of the nodules and non-nodules.

Gabor filter Gabor filters are widely used in the computer vision literature, especially in face recognition [23]. A two-dimensional Gabor filter is a Gaussian kernel function modulated by a complex sinusoidal plane wave as:

$$\begin{aligned} G\left( {x,y} \right) =\frac{f^{2}}{\pi \gamma \eta }\exp \left( {-\frac{{x^{\prime 2}}+\gamma ^{2}+{y^{\prime 2}}}{2\sigma ^{2}}} \right) \exp \left( {j2\pi f{x}^{\prime }+\varphi } \right) , \end{aligned}$$

with \({x}^{\prime }= x \cos \theta +y \sin \theta , y^{\prime } = -x \sin \theta + y \cos \theta \), where f is the frequency of the sinusoidal factor, the phase offset is \(\varphi \) and \(\gamma \) is the spatial aspect ratio.

The number of frequencies is given by F at different wavelet points with number of orientations Q. The parameter F is set to 5 and Q is set to 8 resulting in 40 filters in total used to represent each nodule and non-nodule. Figure 4 depicts sample Gabor and LBP features obtained for malignant, benign and non-nodules from the LIDC data.

The feature vectors depicted in Fig. 4 have been truncated and do not show the entire length of the descriptors used in this work. The features are not scale invariant, as such the feature vectors are normalized to have zero mean and unit variance. From the shown feature information, it can be seen that the non-nodules have noticeable discrepancies over the malignant and benign nodule cases, especially in the case of the Gabor features. This is confirmed by the low pairwise Pearson correlation coefficient, where malignant versus benign is −0.02, malignant versus non-nodule is −0.08 and benign versus non-nodule is −0.04 for the Gabor feature depicted.

Fig. 4
figure 4

Sample LBP (red) signed distance LBP fusion (blue) and Gabor (green) features to represent, a malignant, b benign and c non-nodules

Classification

There are mainly two types of classification approaches: parametric and nonparametric approaches. In this work, classification was performed using the nonparametric kNN [24] and the parametric SVM [25]. The classification problem under consideration discriminates among three mutually exclusive classes {benign, malignant or non-nodule}. Two frameworks are investigated to solve this problem:

  1. (a)

    Proposed multi-class classifier This aims to solve the lung classification problem as 3 simultaneous classes identification. A conventional 3-class classifier is trained to directly assign the probe sample to one of the target classes, see Fig. 2.

  2. (b)

    Proposed cascade classifier Unlike the conventional multi-class implementation, the two-tier cascaded binary classifiers framework splits the sample assignment across two sages. The first classifier (denoted as the C1 classifier) discriminates between nodule and non-nodule samples. If the preceding classifier labels the probe samples as nodules, these samples will be sent to the second-tier classifier (denoted as the C2 classifier) to further distinguish the same as benign or malignant, see Fig. 2.

Fig. 5
figure 5

Examples of ROC curves of the multi-class SVM and kNN classifiers for different features: a, b LBP, c, d signed distance LBP fusion and e, f Gabor

Classifier settings Two main components are required in designing the kNN classifier: a distance measure and the hyperparameter k. To select a distance measure, we conduct a comparative analysis between Euclidean and Mahalanobis distance measures. The hyperparameter, k, is selected using a cross-validation process that examines which of \(1\le k\le 35\) neighbors to obtain “the best k value.”

For the parametric SVM approach, the LBP-based features are used to train a radial basis function (RBF) kernel SVM classifier. The nonlinearity is chosen to transform the two types of LBP-based features into a higher-dimensional space for better discrimination. The two hyperparameters of the RBF-SVM, i.e., regularization parameter, and the parameter that configures the sensitivity to differences in feature vectors, are chosen using a cross-validation process. On the other hand, the Gabor-based features vector is already a high-dimensional vector (\(\approx \)17,000). As such, it is unnecessary to conduct higher-dimensional space transformation using a kernel-based SVM. Thus, the linear SVM classifier is utilized instead. In this case, only one hyperparameter needs to be tuned, the regularization parameter, which is also estimated using a cross-validation process.

Table 1 Means and std of different performance metrics for the proposed multi-class SVM classifier
Table 2 Means and std of different performance metrics for the proposed multi-class kNN classifier
Table 3 Means and std of different performance metrics for the proposed cascade SVM and kNN classifiers

Results and discussion

Results in this work is based on the LIDC database, where 1191 total samples were annotated into one of three categories; benign (B), malignant (M) or non-nodule (N). The data distribution is as follows: 723 benign and 223 malignant nodules between \(3\,\text {mm}\le n\le 10\,\text {mm}\) and 245 non-nodules. Two sets of experiments are conducted, in the instance of nodule categorization: the first, a multi-class classification framework utilizes 220 nodules from each of the classes: B, M and N; the second framework a two-tier cascaded framework used 490 nodule and non-nodule samples in the first tier, i.e., classifier 1 (C1) classifier, and 440 nodules samples (i.e., benign and malignant) in the second tier, classifier 2 (C2). The distributions of the nodule/non-nodule samples handle the imbalance of data samples per class. For the hyperparameters tuning, a validation set (20 samples per class) is used.

In the following experiments, leave-one-out cross-validation (LOOCV) method is adopted to evaluate the proposed approaches. Also, two types of performance measures (metrics) are computed: the area under the receiver operating characteristics (AUC-ROC) curves and the \(\text {f}1-\text {score} =\,(2\times \text {precision}\times \text {recall}/(\text {precision} +\text {recall}))\).

Multi-class classifier evaluation

The first experiment is conducted to test the 3-class SVM and kNN classifiers. To evaluate this approach, three methods are used: (1) Calculation of metrics separately for each class (i.e., one vs. others). (2) Calculation of global metrics (‘micro’) by counting the total true positives, false negatives and false positives in the three classes. (3) Computation of metrics for each class, and obtaining the unweighted mean (‘macro’).

Figure 5 illustrates examples of the ROC curves of the SVM and kNN classifiers using the three types of features. To ensure the generality of the model, the random sampling process of 220 samples per class is repeated 100 times, and LOOCV is performed each time. Then, the mean and standard deviation (std) are calculated for each metric.

Fig. 6
figure 6

Examples of ROC curves of the cascade SVM and kNN classifiers for different features: a, b LBP, c, d signed distance LBP fusion and e, f Gabor

Tables 1 and 2 show values of AUC-ROC and f1-score, which confirm that discriminating non-nodule from other samples is easier than discriminating benign from malignant samples. Also, the results highlight that the Gabor-based features using SVM classifier is more informative than the LBP-based features. kNN results using Euclidean and Mahalanobis distance measures showed the Euclidean to provide overall better AUC-ROC and f1-scores for all features, as such only the kNN Euclidean is considered in subsequent analysis and figures are shown only using this distance measure in Fig. 5.

Cascade classification evaluation

The second experiment is conducted to test the two cascaded binary classifiers (C1 and C2). To evaluate this approach, the metrics are separately calculated for each class. Table 3 shows the values for AUC-ROC and f1-score for both SVM and kNN frameworks. Figure 6 illustrates the ROC curves of this approach using the three types of features. Similarly, the random sampling process of 490 samples is repeated 100 times and LOOCV is performed each time. Then, the mean and standard deviation are calculated for each metric. Figure 6 depicts sample results for the SVM and kNN cascaded framework.

The multi-class framework should distinguish simultaneously between the non-nodules and the specific categorization of nodule diagnosis (i.e., benign or malignant), causing the likelihood of false negatives of the nodule classes to be high. In the cascade approach, the classification of non-nodules as well as nodules was more efficient because false negatives in the first stage were reduced; thus, more true nodule samples were carried-over to the second-tier classification. Thus, increased AUC-ROC and f1-score are reported for the nodule classes.

The features are extracted from each data sample according to certain manually predefined algorithms (e.g., LBP and Gabor) based on the expert knowledge. These parameter features are commonly known as handcrafted features. The main limitation of the proposed approach is when another dataset is used, these algorithms’ parameters [e.g., (PR) of LBP and (FQ) of Gabor] may need to be re-tuned to generate a new set of discriminative features, especially for the benign and malignant classes when samples similarities are minimal. Learned features can assist in overcoming the manually tuning parameters procedure. The learned features are derived from an image database through a training procedure for the purpose of classification, for model generality different databases need to be considered. Thus, the classifiers’ hyperparameters (e.g., regularization parameter of SVM and k of kNN) are estimated using the information from multiple databases instead of a single set which may cause biasing, and this will be examined in the future work.

Conclusion and future work

In this paper, we investigated the effects of texture and shape analysis using LBP, Gabor and a signed distance LBP fusion features descriptors. SVM and kNN classifiers were used for benign, malignant and non-nodule classification, where Gabor-based cascaded SVM provided the highest performance, as shown from an overall AUC-ROC of 0.99 and f1-score of 0.975. To the best of the authors’ knowledge, these results are the best performance obtained using the LIDC database.

Future directions are geared toward generating a larger malignancy nodule database from the LIDC and other clinical data to expand our work. The utilized feature vectors in this paper have hundreds (e.g., LBP) or thousands (e.g., Gabor) of features; however, these high-dimensional features not only slow down the learning process, but can also cause the classifier to over-fit the training data, as irrelevant or redundant features may confuse the learning algorithm. A feature selection method (e.g., PCA) can be applied as a solution to this problem; a subset of features with the highest impact would be considered for classification. Thus, further experimentations with this approach in terms of training and testing data will be conducted. We are also aiming to examine other feature descriptor approaches and classifiers, such as deep features and convolutional neural networks [26, 27], to compare with the results obtained in this paper. The methods utilized in this paper can also be used for false-positive reduction after candidate detection and will be tested in our future endeavors.