Abstract
Aimed at the issue of high feature dimensionality, excessive data redundancy, and low recognition accuracy of using single classifiers on ground-glass lung nodule recognition, a recognition method based on CatBoost feature selection and Stacking ensemble learning was proposed. First, the method uses a feature selection algorithm to filter important features and remove features with less impact, achieving the effect of data dimensionality reduction. Second, random forests classifier, decision trees, K-nearest neighbor classifier, and light gradient boosting machine were used as base classifiers, and support vector machine was used as meta classifier to fuse and construct the ensemble learning model. This measure increases the accuracy of the classification model while maintaining the diversity of the base classifiers. The experimental results show that the recognition accuracy of the proposed method reaches 94.375%. Compared to the random forest algorithm with the best performance among single classifiers, the accuracy of the proposed method is increased by 1.875%. Compared to the recent deep learning methods (ResNet+GBM+Attention and MVCSNet) on ground-glass pulmonary nodule recognition, the proposed method’s performance is also better or comparative. Experiments show that the proposed model can effectively select features and make recognition on ground-glass pulmonary nodules.
摘要
针对当前磨玻璃肺结节特征维数高、冗余数据多、单一分类器识别准确率较低的问题, 提出了一种基于CatBoost特征选择和Stacking集成学习的磨玻璃肺结节识别方法。该方法首先使用特征选择算法进行重要特征筛选, 去除作用较少的特征, 达到数据降维的效果; 其次, 将随机森林、决策树、KNN分类、LightGBM作为基分类器, 支持向量机作为元分类器进行集成学习模型的融合和搭建, 在保持基分类器多样性的同时提升分类模型的准确率。实验结果显示, 所提方法的识别准确率达到94.375%。与单分类器中性能最好的随机森林算法相比, 该方法的准确率提高了1.875%。与磨玻璃肺结节识别领域最近的深度学习方法ResNet + GBM + Attention和MVCSNet相比, 准确率也获得了提升或者性能可比。实验表明, 所提出的模型能够对肺结节进行有效的特征选择和分类识别。
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
AGGARWAL P, VIG R, SARDANA H K. Semantic and content-based medical image retrieval for lung cancer diagnosis with the inclusion of expert knowledge and proven pathology [C]//2013 IEEE Second International Conference on Image Information Processing. Shimla: IEEE, 2013: 346–351.
WANG X, MA D. Advances in computer-aided diagnosis in pulmonary nodules [J]. Chinese Journal of Radiology, 2006, 40(4): 443–445 (in Chinese).
GAO L, YU X X, KANG B, et al. Predictive value of CT-based radiomics nomogram for the invasiveness of lung pure ground-glass nodules [J]. Journal of Shandong University (Health Science), 2022, 60(5): 87–97 (in Chinese).
WAN H Y, LI J, WANG B, et al. Establishment of prediction model for isolated pulmonary benign or malignant nodule by Bayesian network [J]. Journal of Chinese Oncology, 2022, 28(5): 380–384 (in Chinese).
CAI J H, DUAN S F, YUAN H, et al. Machinelearning in differentiating pulmonary invasive adenocarcinoma from non-invasive adenocarcinoma manifested as pure ground-glass nodule [J]. Chinese Journal of Medical Imaging Technology, 2020, 36(3): 405–410 (in Chinese).
LIU X F. The clinical value of CT radiomics in the diagnosis of ground-glass pulmonary nodules [D]. Wuhu: Wannan Medical College, 2021 (in Chinese).
MAĆKIEWICZ A, RATAJCZAK W. Principal components analysis (PCA) [J]. Computers & Geosciences, 1993, 19(3): 303–342.
DAI Y Q, GUO X Y, WANG M, et al. Feature selection of high-dimensional biomedical data based on shuffled frog leaping algorithm [J]. Application Research of Computers, 2021, 38(4): 1062–1068 (in Chinese).
DARABI N, REZAI A, HAMIDPOUR S S F. Breast cancer detection using RSFS-based feature selection algorithms in thermal images J]. Biomedical Engineering: Applications, Basis and Communications, 2021, 33(3): 2150020.
LI Y F, LUO Y, GUO L, et al. Radiomics analysis and machine learning for classification of benign and malignant pulmonary nodules [J]. Radiologic Practice, 2021, 36(4): 464–469 (in Chinese).
MIAO X F, LIU M, JIANG Y. Hepatitis C prediction based on machine learning algorithms [J]. Journal of Jilin University (Information Science Edition), 2022, 40(4): 638–643 (in Chinese).
WU T F, ZHANG R S. Research on the application of machine learning in the malignant grinding glass density nodules of lung [J]. Journal of Guangzhou University (Natural Science Edition), 2018, 17(3): 33–39 (in Chinese).
CHANG T T, LIU H W, FENG J. Support vector machine ensemble learning algorithm research based on heterogeneous data [J]. Journal of Xidian University, 2010, 37(1): 136–141 (in Chinese).
PANG L, LAN W X, WANG Q Q, et al. Machine learning-based survival prediction model for colorectal adenocarcinoma cancer [J]. Modern Preventive Medicine, 2023, 50(2): 227–232 (in Chinese).
BARTLETT P, FREUND Y, LEE W S, et al. Boosting the margin: A new explanation for the effectiveness of voting methods [J]. The Annals of Statistics, 1998, 26(5): 1651–1686.
CHE X J, YU Y J, LIU Q L, et al. Enhanced Bagging ensemble learning and multi-target detection algorithm [J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(12): 2916–2923 (in Chinese).
KUANG J, HONG M J, LIU X C, et al. Classification of pulmonary nodules based on attention mechanism [J]. Computer Applications and Software, 2022, 39(1): 163–167 (in Chinese).
ZHU Q K, WANG Y Q, CHU X P, et al. Multi-view coupled self-attention network for pulmonary nodules classification [M]//Computer vision – ACCV 2022. Cham: Springer, 2022: 37–51.
KIRA K, RENDELL L. The feature selection problem: Traditional methods and a new algorithm [C]//10th National Conference on Artificial Intelligence. San Jose: AAAI, 1992: 129–134.
HE X Y, GONG J, WANG L J, et al. Feature selection based on feature vectorization on computer tomography scan of pulmonary nodules [J]. Application Research of Computers, 2018, 35(8): 2544–2548 (in Chinese).
WANG J, ZHANG X L, ZHAO J J. Feature selection algorithm for diagnostic model of solitary pulmonary nodules [J]. China Sciencepaper, 2014, 9(10): 1201–1205 (in Chinese).
DIMITRIADOU E, WEINGESSEL A, HORNIK K. Voting-merging: An ensemble method for clustering [M]//Artificial neural networks — ICANN 2001. Berlin, Heidelberg: Springer, 2001: 217–224.
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770–778.
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [J]. Communications of the ACM, 2017, 60(6): 84–90.
HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2261–2269.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of Interest The authors declare that they have no conflict of interest.
Additional information
Foundation item: the National Natural Science Foundation of China (No. 62271466), the Natural Science Foundation of Beijing (No. 4202025), the Tianjin IoT Technology Enterprise Key Laboratory Research Project (No. VTJ-OT20230209-2), and the Guizhou Provincial Sci-Tech Project (No. ZK[2022]-012)
Rights and permissions
About this article
Cite this article
Miao, J., Chang, Y., Chen, C. et al. Ground-Glass Lung Nodules Recognition Based on CatBoost Feature Selection and Stacking Ensemble Learning. J. Shanghai Jiaotong Univ. (Sci.) (2024). https://doi.org/10.1007/s12204-024-2761-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12204-024-2761-9