Introduction

Cancer is a significant global health burden [1]. Among all cancers, lung cancer is not only the most prevalent but also one of the deadliest [2, 3], surpassing mortality rates of other types such as bladder, brain, and breast cancer [4, 6]. In fact, globally, lung cancer accounts for approximately 1.8 million fresh incidences and results in more than 1.4 million fatalities each year [1, 7].

The clinical manifestations of early stage lung cancer are isolated lung nodules. The visual characteristics of nodules exhibit significant variation, displaying subtle distinctions in shape, texture and size [8]. In medicine, we usually classify circular lesions based on their diameter [9]: Lesions with a diameter less than 10 mm are commonly referred to as micronodules, while those ranging from 10 to 20 mm are classified as small nodules. Nodules with a diameter ranging from 20 to 30 mm are categorized simply as nodules. Clinicians and patients focus primarily on monitoring and detecting small nodules or micro-nodules less than 20 mm in diameter. Monitoring this can prevent the nodules from getting bigger or worse. Once similar lesions are found, doctors will conduct regular follow-up observation to prevent deterioration or even cancer, and to achieve early detection, diagnosis, treatment and rehabilitation.

Lung nodules are medically categorized into six groups, namely, ground-glass benign nodules (GB), ground-glass malignant nodules with predominantly ground-glass components (GMG), ground-glass malignant nodules with predominantly solid components with a good prognosis (GMSG), ground-glass malignant nodules with predominantly solid components with a poor prognosis (GMSP), solid benign nodules (SB), solid malignant nodules (SM). For a more concise description we use the above abbreviations in the following. Figure 1 shows the CT images of six types of nodules, in which (a)–(d) are ground-glass nodules, which have uneven in density and have blurred margins and are more closely resemble the surrounding normal lung tissue. (e) and (f) are solid nodules, which have uniform density and clear margins, and are generally larger and more prominent.

Fig. 1
figure 1

CT images of 6 categories of lung nodules. a GB, b GMG, c GMSG, d GMSP, e SB, f SM

Before studying the classification of lung nodules, we have realized the fine segmentation of lung airways [10], lung vessels [11] and lung parenchyma [12]. By visualizing the segmentation of lung airways, lung vessels and lung parenchyma, the location of lung nodules can be better localized, and on this basis, it can significantly aid in the diagnosis, treatment, and surgical procedures related to lung nodules.

Regarding classifying lung nodules, the existing studies are mainly based on the simple classification of benign and malignant lung nodules or the categorization of solid and ground-glass nodules, lacking a comprehensive and detailed classification and prognostic assessment. According to the early survey results [13, 14], the accuracy rate of general doctors in classifying lung nodules based on their clinical experience is generally 60+\(\%\) to 70+\(\%\), even for experienced doctors, the highest accuracy rate is about 85\(\%\). The analysis of CT scans and the detection of lung nodules, particularly smaller ones, require significant effort from physicians. They need to carefully review the scans and assess the nodules based on their shape, texture, and size to determine whether they are malignant or not. This process heavily relies on the physician’s professional experience [15,16,17]. On average, it is expected that a physician spends around 5 min analyzing each lung nodule [16]. However, elements including diversions, exhaustion, and variations in observations among different observers and within the same observer can introduce instability into the diagnostic process, leading to incorrect or missed diagnoses by radiologists [18,19,20]. As a result, potentially malignant lung nodules may go unnoticed.

Therefore, in order to enhance the accuracy of classifying early microscopic lung nodules and mitigate the risk of misdiagnosis by physicians, and provide a more comprehensive and detailed classification of early microscopic lung nodules. A novel cascade classification method for lung nodules using a deep learning neural network is presented in this paper. This approach is able to effectively categorize ground-glass and solid nodules, benign and malignant nodules, and nodules with predominantly ground-glass or solid component according to specific needs, which enhances the accuracy of lung nodule classification while offering a fast, precise, and reliable foundation for further improvements in lung nodule classification. According to previous studies, adaptive preprocessing of image data itself can bring a more significant improvement effect compared with constant adjustment of deep learning neural network model parameters and small-scale modification of the network model. nnU-Net [21], which currently stands out as the most prominent within the realm of medical image segmentation, is the best proof of this idea. Based on this, a new cascade classification process for lung nodules was designed in this project, focusing on data pre-processing and data expansion, and paying attention to the use of a pre-training model.

The contribution of the work in this paper is mainly in four aspects:

  1. 1.

    According to the actual needs of hospitals, a cascade classification method of lung nodules was developed, which first classified lung nodules into solid and ground-glass nodules, and then classified the two types of nodules into benign and malignant. Malignant ground-glass nodules can be further classified into ground-glass component and solid component, and solid component can be further classified into good prognosis and poor prognosis.

  2. 2.

    Preprocessing of lung nodule image data. In this paper, by analyzing the tagged data, selecting the appropriate shear size, drawing the coarse contour points and overlapping with the shear map, the classification accuracy can be greatly improved.

  3. 3.

    Sample expansion of the raw dataset by combining transformations such as rotation, mirroring (left, right, up, down), luminance, noise, contrast, etc., and training in conjunction with the parameters provided by the preclassification model.

  4. 4.

    In addition to studying cascade classification algorithms for lung nodules, the development of a non-invasive system to classify types/subtypes of lung nodules can help clinicians make timely and targeted treatment decisions, which can have a positive impact on patient comfort and survival. The key algorithms and techniques investigated in this paper have now been integrated into a lung medical aid diagnosis system, which has significant application value in assisting physicians in film reading and diagnosis.

Related works

Deep learning, a critical algorithm within the realm of machine learning, strives to extract intricate, abstract features from data using multi-layered, non-linear transformations, learning the underlying distribution patterns of the data to obtain the ability to make reasonable judgments or predictions about the new data. Deep learning has started to emerge in various fields with its powerful fitting ability, particularly in image categorization where deep learning methods can be effectively leveraged.

Given that the manual examination of lung nodules is laborious and time-intensive, to improve the screening efficiency and accuracy, researchers invoked deep learning methods to classify lesion sites by constructing deep neural network models. Dubray et al. [22], explored the strengths and drawbacks of nuclear imaging in aiding radiation oncologists in devising lung cancer radiotherapy plans, it was discovered that meticulous partitioning of the designated volume and vital organs serves as the foundation for the entire radiotherapy process. However, these segmentation tasks are predominantly performed manually, involving arduous and time-consuming visualization steps, and are prone to inconsistencies between different observers. As a result, there arises a necessity for automated solutions that can effectively visualize intricate volumes derived from diverse imaging modalities, facilitating the validation of innovative concepts through extensive clinical trials. Sun et al. [23] found that the construction of deep neural network models has demonstrated potential in aiding the categorization, diagnosis, and management of esophageal cancer. These models have the capability to enhance the long-term survival rates of patients.

In 1998, Armato [24] put forward a completely automated approach to lung nodule detection for spiral CT scanning of the chest, which had a sensitivity of 70\(\%\) for lung nodule detection [25, 26]. In 2002, Wiemker et al. proposed a specially designed nodule detection algorithm for multi-slice CT images with 1 mm slice thickness, which was applied to images from 12 CT scan cases and resulted in a sensitivity of 86\(\%\) for detecting nodules larger than 1 mm and 95\(\%\) for nodules larger than 2 mm [27]. In 2010, Wang et al. [28] analyzed that the analysis of texture features is the most important in the CAD system for benign and malignant lung nodules and proposed five most valuable texture features. In 2015, Hua et al. [29] used CNN and deep belief network to simplify the conventional CAD system for lung nodule diagnosis and compared the classification results with the conventional classification approach relying on geometric characteristics of the image (SIFT+LBP) and fragmentation analysis technique [30], which showed that the algorithm based on deep learning model has better diagnostic performance. Shen et al. [31] based on the images of lung nodules at different scales, used a CNN framework to extract characteristics of lung nodules and diagnose them with an average correct rate of 86.84\(\%\).

Since not all nodules are cancerous and lung nodules can be either benign or malignant, numerous approaches have been suggested to differentiate between the two categories. In 2018, Causey et al. [32] introduced a hybrid feature-based NoduleX framework that leveraged features derived from a CNN model and quantitative image characteristics to discern between benign and malignant lung nodules, and its predictive ability reached an AUC level close to 0.99. In 2019, al-shabi [33] developed a CNN-based 2D gated expansion model specifically for classifying lung nodules as malignant or benign from CT images. Their experimental findings showcased a test dataset accuracy of 9\(\%\). In 2021, Jena et al. [34] proposed a multilayer latent variable neural network for the classification of lung nodules as benign or malignant using CT images. Their model attained a test data accuracy of 88\(\%\).

To address the challenge posed by the diverse range of lung nodule data exhibiting similar features and thus the inability to accurately classify them all at once, the researchers proposed a cascade network approach that improves the classification accuracy by reducing the labels of the dataset. Morales et al. [35] investigated methods to depict the lungs of the same subject in a sequence of 3D images to reduce uneven contrast, assuming that the existing methods can successfully segment the first and most contrasted image of the sequence, proposed to align the remaining images to the first image using a cascade method of successive alignment, and then deform the initial segmentation using a cascade transform, and found that the cascade approach offers enhanced precision while significantly reducing user time requirements.

Materials and methods

In this section, we start by introducing the dataset and necessary preparations for the experiment. Subsequently, we present the Resnet34 deep learning residual network, the lung nodule classification algorithm design and implementation, as well as the construction of the cascade classification network. The overflow charts of the work in this paper is shown in Fig. 2. The training phase includes data preprocessing part and model training, where data preprocessing includes: tagging, cropping and data expansion. Model training includes 6 models, where Model 0 is a six-classification model and Models 1–5 are binary classification models. In the testing phase, we cascade the five binary classification models to achieve six-class lung nodule classifications, and we also used the traditional six-classification model Model 0 for direct six-classification to provide auxiliary classification information.

Fig. 2
figure 2

Overflow charts of the work

Dataset

This paper uses the early microscopic nodule dataset provided by a partner hospital with a size of 75 G. The nodules are classified into two primary groups: ground-glass nodules (120 groups) and solid nodules (400 groups). The dataset can be divided according to the grouping method provided by the hospital, as shown in Table 1.

Table 1 Statistics on the grouping of lung nodules dataset

Each set of files consists of raw image information and a JSON annotation file. The image information is in DICOM and NII formats, both of which can record 3D pixel information of images. Each group of CT scans contained 200–500 sections, but only a few to a dozen of them contained nodular lesions. The section marker information containing the lesions and the information of the edge point information of the lesions are recorded in the JSON file. In the JSON file, the variable ImageUid is the unique identifier of each slice; height and width correspond to the height and width of the slice, respectively; points are the edge pixels of the focus, and each of the two values constitutes an exact two-dimensional space (y, x) pixel coordinate.

As the paper adopts a cascade classification network consisting of 5 binary classification models based on Resnet34, the raw dataset is partitioned into 5 distinct sub-datasets, each sub-dataset comprises two types of data.

Table 2 Labels of lung nodules and their corresponding types

Table 2 presents the details of the sub-datasets, which, similar to the raw dataset, all include training and validation sets. Among them, sub dataset 1 contains ground-glass nodules and solid nodules. The data of ground-glass nodules include four types of data: GB, GMG, GMSG and GMSP. Solid nodules include SB and SM.

Sub dataset 2 comprises two types of data: benign and malignant ground-glass. Malignant ground-glass data includes three types: GMG, GMSG, and GMSP.

Sub dataset 3 contains two types of SB and SM.

Sub dataset 4 contains two types of data: ground-glass malignant mainly ground-glass components(GMG) and ground-glass malignant mainly solid components(GMS). GMS include GMSG and GMSP.

Sub dataset 5 contains two types of data: GMSG and GMSP.

Data preprocessing

Effective focus data screening

Since most sections do not contain the nodular lesion area, a small number of sections containing the lesion area should first be screened out according to the JSON annotation file.

Fig. 3
figure 3

Examples of incorrectly labeled (top) and undersized lung nodules (bottom)

Meanwhile, careful observation of the sample set shows that some samples are mislabeled and the diameter of the lung nodule is too small. Figure 3 Row 1 shows that the label is outside the lung parenchyma, which is a tagging error that is not conducive to training. Row 2 shows that the lung nodule diameter is too small for clinical analysis. To enhance classification accuracy, this study further screened the nodule samples, eliminated the unqualified samples, and deleted the invalid duplicate data, and the final number of images and dataset partitioning are shown in Table 3.

Table 3 Number of dataset samples after filtering out slices that do not contain lung nodule lesions and slices with too small lesion diameters, and the number of slices after expansion

Focal contour tagging

Fig. 4
figure 4

The raw images (top) and the images after tagging the contour of the lesion area (bottom)

According to the analysis of tag files, points in JSON file represent pixel coordinates of lung nodule contour (\(y_{1}\), \(x_{1}\)), (\(y_{2}\), \(x_{2}\)), (\(y_{3}\), \(x_{3}\)),......, (\(y_{n}\), \(x_{n}\)). Therefore, this paper can tag the nodules on the raw image by calling the matplotlib library, and the effect of the marking can be observed in Fig. 4, with the upper row displaying the raw image and the lower row showcasing the image with the applied tags.

Region of interest cropping

Although the tagging operation improves the accuracy to some extent, the size of a slice is 512\(\times\)512, but the vast majority of lung nodules are only 32\(\times\)32 in size, which suggests that the region of interest is less than 1/256 of the raw image, so we incorporated a cropping procedure to extract the specific area of the lung nodule in order to enhance the sensitivity of the region of interest.

Two image cropping methods are considered in this part of the experiment. The first method included pixel information around the lesion and centered the lesion area in the cropped image. The second is to consider only the minimum lesion area solid bounding box, and all the rest as the background. The specific operations of the two cropping modes are described as follows.

  1. 1.

    Peri-lesion area (crop-PLA)

To ensure that the lung nodular lesion in the image data input is positioned at the center, it is necessary to first determine the starting point of cropping. The following steps outline the specific procedures to be followed:

Based on the annotation information, the (y, x) pixel coordinates of the lung nodule outline are utilized to determine the upper-left coordinates of the nodule, \(x_{min}\), \(x_{max}\), \(y_{min}\), \(y_{max}\), and (\(y_{min}\), \(x_{min}\)) are the upper-left coordinates of the nodule region (with the upper-left part of the image as the origin of the coordinates, where y is the horizontal coordinate and x is the vertical coordinate, and the range of the y and x coordinates are both [0, 512]). To center the node on the image center, the point (\(y_{min}\),\(x_{min}\)) must be moved to the left diagonal to get the coordinates of the cropping start point (\(y_{ori}\), \(x_{ori}\)), that is, \(x_{min}\) is translated up by \(x_{tra}\) and \(y_{min}\) is translated left by \(y_{tra}\), where \(x_{tra}\) and \(y_{tra}\) are the upward and leftward translation distances, respectively. At this point, the starting point of the shear is calculated as shown in Eq. (1).

$$\begin{aligned} (y_{ori}, x_{ori})&= (y_{min}-y_{tra}, x_{min}-x_{tra}) \end{aligned}$$
(1)

The maximum height and width of the node are calculated as shown in Eqs. (2) and (3), where height is the maximum height and width is the maximum width.

$$\begin{aligned} height&= x_{max}-x_{min} \end{aligned}$$
(2)
$$\begin{aligned} width&= y_{max}-y_{min} \end{aligned}$$
(3)

In order to calculate \(y_{tra}\) and \(x_{tra}\), the crop size \(l_{crop}\) must first be determined, in this paper, after prior observation and calculation, the image crop size \(l_{crop}\) is set to 64 pixels. The method of calculating \(y_{tra}\) and \(x_{tra}\) is shown in Eqs. (4) and (5).

$$\begin{aligned} \hbox {if} {width}< l_{crop} \hbox {and} {height} < l_{crop} :\end{aligned}$$
$$\begin{aligned} y_{tra}&= (l_{crop}-height) / 2 \end{aligned}$$
(4)
$$\begin{aligned} x_{tra}&= (l_{crop}-width) / 2 \end{aligned}$$
(5)

Then the calculated \(y_{tra}\) and \(x_{tra}\) are plugged into Eq. (1) to calculate the cropping start point (\(y_{ori}\), \(x_{ori}\)). From this point, a 64\(\times\)64 px crop is performed. The effect of cropping is shown in row 1 of Fig. 5, with the effect of magnification of the lung nodules shown in the upper right of each subimage.

  1. 2.

    Minimum lesion area solid bounding box (crop-MLASB)

Take (\(y_{min}\), \(x_{min}\)) directly as the start coordinates of the cropping, that is, (\(y_{ori}\), \(x_{ori}\))=(\(y_{min}\), \(x_{min}\)), and take width and height as the width and height respectively, to get the required ROI (Region of Interest) region. Finally, 0 pixel value is used to fill the surrounding area, so that the input network training images are unified into a 64 × 64 size. The image data cropped in this step is shown in row 2 of Fig. 5, and the upper right of each sub-image is the enlarged display effect of lung nodules.

Fig. 5
figure 5

(1)Schematic diagram of image cropping with pixel information around lung nodules (2) Schematic diagram of cropping considering only the minimum circumscribed rectangular of lesion

Intra-group data expansion

Generally, medical image datasets are faced with the problem of too small sample size, especially the ground-glass nodules in this experiment, which will inevitably affect the ultimate model classification accuracy. Therefore, the following data enhancement methods are used in this experiment to expand the dataset samples: (1) rotation, (2) mirroring, and (3) brightness increase. This paper uses three methods (1), (2) and (3) to combine. The combination method is shown in Table 4.

Table 4 Examples of data expansion combinations

It is not difficult to imagine that if part of a patient’s sectional data appears in the training set and the other part appears in the test set, a high accuracy can be easily obtained in the test because the network extracts the patient’s image features during training. Therefore, if all the image data are considered as a whole for data expansion, the phenomenon of the above result accuracy is bound to appear. Therefore, the method adopted in this experiment is to expand the slice data in each group. The expansion results are shown in Table 3.

Classification based on Resnet

Ever since AlexNet won the ILSVRC 2012 challenge, it has been assumed that the greater the depth of a CNN, the higher its potential for accuracy. With the advent of networks such as VGGNet and Inception, this conclusion continues to be validated and strengthened. However, the ResNet team’s experiments found that while the initial state of the network’s accuracy can increase with the number of layers, it gradually saturates or even “degrades” at a certain point.

Assuming that the network is trained and highly accurate once it reaches a threshold number of layers, and then a series of equivalent transformations (y = x) are performed without any processing of the input data, theoretically no “degradation” occurs from these layers onward. For neural networks with a large number of layers, once accuracy is maximized, it should be able to be maintained without degradation.

Fig. 6
figure 6

Structure diagram of the residual network and regular network

Fig. 7
figure 7

Model architecture of the 34-layer ResNet network

Deep learning’s advantage over traditional methods is that deeper network layers and nonlinear activations automate feature extraction and transformation. Nonlinear activation functions map the data to a higher-dimensional space, enhancing the ability to categorize the data. However, as the network deepens, introducing more activation functions causes the data to be mapped to a discrete space, which is prone to the vanishing gradient problem and is difficult to map back to the origin. To solve this problem, shortcut branches can be added to the network to achieve a balance between linear and nonlinear transformations.

Based on this idea, the ResNet team proposed the “Shortcut Connection”, and the network constructed based on this structure is called ResNet [36]. The structure of this residual network in comparison to the conventional network can be seen in Fig. 6.

Figure 7 illustrates the architecture of a 34-layer ResNet network consisting of the basic ResNet structure. Similar to AlexNet, ResNet can be composed of eight building blocks, each with a number of network layers or building blocks greater than or equal to 1. As shown in Fig. 7, the ResNet network starts with a normal convolutional layer, followed by a maximum pooling layer and a building block with three residual modules. The third, fourth, and fifth building layers all start with the sampled residue module, followed by 3, 5, and 2 residue modules, respectively. The data is then averaged again, features are extracted through fully connected layers, and finally classification results are obtained through Softmax.

Design and implementation of lung nodule classification algorithm

The types of lung nodules and their corresponding labels are shown in Table 5. In this paper, the category names of lung nodules are represented by their label names.

Table 5 Labels of lung nodules and their corresponding types

Traditional multi-classification model

In order to select the optimal data preprocessing method and to highlight the effect of cascade classification, this experiment first trained a six-classification model, which can be used for classification of lung nodules into six types, as shown in the Multi-classification Model section of Fig. 2.

Cascade network

Due to the large types of data, the accuracy of classification would be low if only the traditional classification method is used to classify the data. However, if only two types of data are divided into hierarchical classification at one time, the accuracy can be significantly enhanced. Therefore, this paper presents an extended version of the traditional deep learning network, known as a cascade CNN, enabling hierarchical classification with successful outcomes. It exhibits superior scientific objectivity and accuracy in comparison to conventional classification networks. The cascade network resolves the issue that the datasets are of many kinds and the characteristics are similar so that it cannot be accurately classified at one time.

Fig. 8
figure 8

Cascade network technology Roadmap

Through label reduction, the cascade network significantly enhances classification accuracy, offering a rapid, precise, and dependable foundation for improving classification efficiency.

According to the actual needs of the hospital and the desired effect, the cascade classification is carried out, and five binary classification models are trained in turn. As shown in Fig. 2 Cascade Network Part, all datasets are first used for binary training, and lung nodules are divided into ground-glass and solid, to obtain Model 1. Then, the ground-glass nodules are divided into benign and malignant ground-glass nodules by binary training to obtain Model 2. At the same time, solid nodules are also divided into benign solid nodules and malignant solid nodules through binary training, and Model 3 is obtained. After that, malignant ground-glass nodules are trained, which are divided into ground-glass component and solid component, and Model 4 is obtained. Finally, the nodules dominated by malignant solid components of ground-glass are classified into good prognosis and poor prognosis, and Model 5 is obtained.

The technology roadmap for the cascade network is shown in Fig. 8. In the figure, result represents the prediction result and predict() represents the model call interface. M1, M2, M3, M4 and M5 represent five classification models (classifiers). The cascade network first reads the test data, records the label of each sample, and then predicts each sample. predict(M1) is first called to predict whether it is ground-glass or solid type. If it is the type of ground-glass predict(M2) is called to output the prediction result of benign or malignant ground-glass; If the ground-glass is malignant, predict(M4) is called to output the prediction result. The ground-glass component is dominant or the solid component is dominant. If the solid component is dominant, predict(M5) is called to output good or poor prognosis. If it is a solid type, predict(M3) is called to determine whether it is a solid benign type or a solid malignant type.

Results

Experimental settings

Prior to training the Resnet34 model, the dataset is divided into 5 sub-datasets, each undergoing separate training sessions. Although the datasets differ, the training parameters remain consistent.The experiment is configured with a batch size of 16, 100 epochs, a learning rate of 0.0001, and the AdamW algorithm serves as the optimizer.

The necessary software, hardware environment, and equipment details for this experiment are outlined below: (1) CPU processor: Intel(R) Core(TM) i7-10700F CPU @ 2.90GHz; (2) GPU processor: NVIDIA GeForce RTX 2070 SUPER; (3) Operating system: Windows10 64-bit; Ubuntu 20.04.2LTS; (4) Programming language and compilation environment: Python3.7; PyCharm 2021.1x64;

Evaluation indicators

To evaluate the model, we need to analyze each category in detail. Therefore, we need to calculate four evaluation indicators: Test Accuracy, Precision, Recall and F1 score.

  1. 1.

    Test Precision represents the proportion of accurate predictions among all predicted samples, encompassing both positive and negative instances. It is calculated as follows:

    $$\begin{aligned} Accuracy&= (TP+TN)/(TP+TN+FP+FN) \end{aligned}$$
    (6)
  2. 2.

    Precision is calculated as the proportion of accurately predicted positive samples to the total predicted positive samples, determined by the following formula:

    $$\begin{aligned} Precision=TP/(TP+FP) \end{aligned}$$
    (7)
  3. 3.

    Recall, also known as sensitivity or true positive rate, is a performance metric used in machine learning and classification tasks. It evaluates the model’s capability to accurately detect positive samples. The formula to calculate recall is as follows:

    $$\begin{aligned} Recall=TP/(TP+FN) \end{aligned}$$
    (8)
  4. 4.

    The F1 score is a reconciled average of the precision and recall of the model, serving as a statistical metric for the precision of a binary classification model. Its calculation is as follows:

    $$F1score = 2 \times (Precision \times Recall)/(Precision + Recall)$$
    (9)

TP: True Positive - The true value is positive and the predicted value is positive, meaning the model correctly identified a positive example. FP: False Positive - The true value is negative, but the predicted value is positive. This means the model incorrectly identified a negative example as positive. TN: True Negative - The true value is negative, and the predicted value is negative, indicating that the model correctly identified a negative example. FN: False Negative - The true value is positive, but the predicted value is negative. This means the model incorrectly identified a positive example as negative.

Comparison results

In this paper, we use the Alexnet [37], Googlenet [38], and VGG models [39] for comparison with the Resnet model. The training data used for the experiments are datasets processed by the tag+crop-PLA+expanded data processing method, including six raw categorical datasets and five dichotomous subdatasets. The classification accuracies of the models are shown in Table 6, and Resnet achieves a classification accuracy of 0.7296 among the six classifications, which is higher than the classification accuracies of Alexnet, Googlenet, and VGG. The classification accuracies of the four models under subdataset 1 are better than the other subdatasets, reaching 0.9335, 0.9383, 0.9328, and 0.954, respectively.The classification accuracies of the four models under subdataset 5 are the lowest, but Resnet is still higher than the other models. The results show that Resnet outperforms the Alexnet, Googlenet, and VGG models in both lung nodule classification tasks (hexaclassification and dichotomization).

Table 6 Model accuracy comparison
Table 7 Multi-classifier, cascade-classifier classification technical report

To compare the cascade network and the six-classification model, we computed four evaluation indicators: accuracy, precision, recall, and F1 score, for each of the six categories of lung nodules. The classification techniques reported for the cascade network and the six-classification model are shown in Table 7, where the six-classification network is denoted by Multi-Classifier and the cascade network is denoted by Cascade-Classifier. According to the data presented in Table 7, it is evident that the cascade network achieves a test accuracy of 80.04\(\%\), surpassing the six-classification model by 7.08\(\%\). Comparing the performance indicators of the cascade classifier and the six-classification model, we observe that the cascade classifier has higher precision, recall, and F1 score. Specifically, when computing the precision, recall, and F1 score for the classification of GB, SB, and SM node types, the cascade network demonstrates a slightly higher improvement compared to the six-classification model. On average, the improvement is approximately 5\(\%\). This is because these three types of nodes in the feature space of the divisibility of the higher easy to distinguish. In the calculation of classification indexes for GMG, GMSG and GMSP, the cascade network shows significant improvements in the Precision, Recall, and F1 score indexes compared to the six-classification models. On average, there is an improvement of approximately 25\(\%\). This is due to the fact that these three types of nodes are not easy to distinguish because they are less distinguishable in the feature space, which proves that the cascade network can provide better classification performance.

Analysis of ablation results

In the six-classification experiments, in this paper, we have experimented with four kinds of data with different degrees of manipulation, and we have tagged the raw image of 512\(\times\)512, as shown in Fig. 9b. This paper includes two cropping methods, one of which is the smallest size retained at the time of cropping, which means that the size of the cropped image is usually not uniform. Since CNN needs uniform size input, we 0-fill all cropped images to 64\(\times\)64 size. Since the lung nodules are all in the intrapulmonary region, another cropping method can directly intercept the input of 64\(\times\)64 size according to the location of the lung nodules. The training set comprises 3099 slices, while the test set consists of 520 slices. The accuracy of the four input data is shown in Fig. 10.

Fig. 9
figure 9

Schematic representation of data from four different treatments for lung nodules. (1) Raw data (2) Tagged lesion contour (3) Tag + Crop-MLASB (4) Tag + Crop-PLA

Fig. 10
figure 10

Histogram of six-classification results of lung nodules after four different treatments

As shown in Fig. 10, the classification accuracy of the raw data is only 32.51\(\%\), and the accuracy of the raw image can be tagged to reach 63.42\(\%\), which is almost one times higher than that before tagging. After cropping based on tagging, the accuracy is increased by 5–10\(\%\), in particular the crop-PLA method achieves an accuracy of 72.96\(\%\), exhibiting an improvement of nearly 10\(\%\) compared to the pre-tagging stage. However, the accuracy of direct six-classification is still low. The reason may be that there are many types of target nodules, but few samples of certain categories. As for the data cropping methods, it is clear that the methods that include information around the lesion achieve better performance. Therefore, the data cropping method that only includes the solid bounding box of the minimum lesion area is not considered in the subsequent cascade classification experiments.

Combined with the actual needs of hospitals, it is usually necessary to obtain a classification result before further subdivision. Therefore, the second part of the experiment is cascade classification. To demonstrate the effectiveness of three types of operations, namely focus contour tagging, image cropping, and data expansion, the performance of six types of input data, namely raw data, tagging only, cropping only, expansion only, tagging + cropping, and tagging + cropping + expansion only, are compared. Figure 11 displays the outcomes of the ablation experiment. The vertical axis of the graph represents the classification accuracy, and the horizontal axis represents the five sub-datasets, each of which was subjected to six sets of experiments.

Fig. 11
figure 11

The results of different data processing methods in the classification experiment

As shown in Fig. 11, using all five data preprocessing methods in the five sets of experiments can enhance the accuracy of lung nodule classification. From the first three experiments, it can be seen that the tag + crop-PLA + expanded method improved the classification accuracy more, and the improved classification accuracy reached 96.72\(\%\), 95.25\(\%\) and 90.19\(\%\). Compared with the classification accuracy of the raw dataset, the classification accuracy is enhanced by 11.03\(\%\), 13.25\(\%\) and 14.26\(\%\), respectively, which is because these three types of datasets have more obvious differences in image features, which belong to the simple samples that are easier to learn for the classification network. While in the last two experiments, the five preprocessing methods improve the classification accuracy less, only 1.87\(\%\) and 5.17\(\%\). This is because these two types of datasets image features are not significantly different, belonging to the complex samples for the classification network is easier difficult.

As shown in each cluster in Fig. 11, all three methods of tagging, cropping, and expansion are helpful in improving classification accuracy. This is due to the tagging process, which emphasizes the position and characteristics of the nodules in the image, cropping can exclude the influence of irrelevant regions on the classification effect, and data expansion helps rectify the imbalance between positive and negative samples, promoting equilibrium in their proportions. In the third set of experiments, the comparison effect is particularly obvious, and the average improvement of the three methods is about 6.4\(\%\), which can be attributed to the significant difference in the ratio of positive to negative samples in the third dataset, resulting in more distinct differences in their features. However, in the fourth set of experiments, the comparison effect is not obvious, and the three methods only improve the proportion of positive and negative samples by about 1.4\(\%\) on average, which is because the proportion of positive and negative samples in the fourth set of data sets is less different and the features are more similar.

Using the tag + crop-PLA + expanded data processing method, we conducted an analysis of the training accuracies of five binary classification models (classifiers) and one six-classification model. We also considered the classifier types, classification output results, and datasets used for training. The results can be found in Table 8. In Table 8, it is evident that the classification accuracies of the five binary classification models used in the cascade network (referred to as cascade classifiers) all exceed 85\(\%\). Model 1 achieved the highest classification accuracy of 96.72\(\%\), while Model 0 had the lowest accuracy of 72.96\(\%\). Notably, the classification accuracies of the five binary classification models are higher than those of the six-classification models used in traditional classification networks (referred to as multiple classifiers). This indicates that the cascade classifiers, with their binary classification approach, outperform the traditional multiple classifiers in terms of classification accuracy. The higher accuracies achieved by the binary classifiers demonstrate their effectiveness in accurately classifying the samples.

Table 8 The Six-classification models obtained through training and their accuracies

Discussion

In this paper, a cascade classification network based on the Resnet neural network model is constructed to classify early-stage microscopic lung nodules. Resnet introduces residual connectivity, which avoids information loss and degradation by directly transferring information across layers, helps to train the deep network more efficiently, and also helps to mitigate the overfitting problem. Compared to AlexNet, GoogLeNet and VGG models, the Resnet network structure is deeper, so Resnet can learn richer and more complex feature representations, resulting in better expressiveness and performance.

In our experiments, we used Alexnet, Googlenet, and VGG models to perform two sets of comparisons with Resnet models, a six-classification experiment and a binary classification experiment. The experimental data used are the raw dataset and five sub-datasets processed by tag + crop-PLA + expanded data processing methods, respectively. In the six-classification experiment, the six-classification accuracies of the four models are close to each other, but the accuracy of Resnet is slightly higher, as shown in Table 6. In the binary classification experiment, we found that the highest classification accuracy was achieved under sub-dataset 1, mainly because (1) the number of samples of ground-glass and solid nodules was close, (2) the contrast of image features was very obvious. However, when training on sub-dataset 4 and sub-dataset 5, we found that there were relatively few samples with predominantly ground-glass components due to the small and unbalanced amount of data for the three types of lung nodules included in ground-glass malignancy. Furthermore, as can be seen in Fig. 1b–d, all three types of ground-glass malignant nodule shapes share common features such as irregular shapes, fuzzy boundaries, and similar sizes, which lead to low separability of ground-glass malignant nodules in feature space. Resnet, on the other hand, shows a high accuracy rate, proving that Resnet can learn richer and more complex feature representations.

In selecting the key steps in data preprocessing, we first validated the data generated by the four preprocessing methods using the six-classification method. As shown in Fig. 10, the raw dataset obtained by processing with tag + crop-PLA had the highest accuracy in the six-classification experiment. The tagging operation in the preprocessing introduces manual interaction, which greatly improves the accuracy. The cropping operation extracted the regions of interest of lung nodules from the 512\(\times\)512 images, reducing the interference of non-lung nodule regions on the classification results. Maintaining a certain size of the peripheral region around the lung nodules during the cropping process is preferable to including only the nodule region, because there is a part of the peripheral region around the lung nodules that has features that are useful for classification.

This indicates that the tag + crop-PLA method demonstrated optimal performance in the six-classification experiments. Therefore, in the subsequent cascade classification experiments, we used the data processing method using tag + crop-PLA as the basic data preprocessing method. We then validated the above four methods using a binary classification method, the results once again demonstrated that the tag + crop-PLA method achieved the highest accuracy in both the six-classification and binary classification experiments.

In addition, in the binary classification experiments, we performed the expansion on the raw size dataset and the cropped dataset, respectively, and finally we found that the tagged-cropped-expanded data processing method made a large contribution to the improvement of the model accuracy in all five sub-datasets, as shown in Fig. 11. Tagging, cropping, and data expansion play different roles in different datasets, but in general, when all three operations are performed simultaneously, the final accuracy is greatly improved over the raw data. Furthermore, from the comparison between cropping (gray bars) and tagging + cropping (blue bars), it can be inferred that tagging the focus contour helps to facilitate feature extraction and learning of the focus region by the network model, leading to higher accuracy results. Finally, data expansion plays different roles in different datasets, but for surface glass nodules with small data volume and rich features, data expansion contributes the most to accuracy improvement, proving that data expansion can significantly contribute to the overall outcome.

Through the comparison of Test Accuracy, Precision, Recall, and F1 score, as presented in Table 7, it is evident that all evaluation metrics of the cascade network outperform those of the traditional multi-classification model. This superior performance can be attributed to the cascade network’s ability to mitigate the impact of diverse data samples and imbalances in sample numbers by employing multiple binary classification models. The results demonstrate that the cascade network exhibits stronger classification performance advantages compared to traditional multi-classification models.

Conclusion

This paper introduces a cascade network-based method for classifying early-stage microscopic lung nodules, i.e., firstly, lung nodules are classified into solid and glassy nodules, and then the two types of nodules are classified into benign and malignant. Glassy malignant nodules can be further divided into glassy component and solid component, and solid component can be divided into good prognosis and poor prognosis. For the pre-processing of lung nodule image data, this paper analyzes the annotated data, selects the appropriate shear size, and draws coarse contour points to be superimposed on the raw image. At the same time, to improve the classification performance, this paper combines the rotation, mirror (left, right, up and down), brightness, noise, contrast and other transformations of the raw dataset to expand the samples, and all the indexes in Precision, Recall and F1 scores are better than the other models, and the cascade network proposed in this paper outperforms the traditional multiclassification model. The test accuracy reaches about 80\(\%\) on average, which basically meets the clinical needs.

Future work

In future work, we plan to gather additional data on lung nodules, enhance the model’s generalizability, and employ more advanced network architectures such as transformer-based models or large-scale model techniques for lung nodule classification.