Detection of lung cancer mutation based on clinical and morphological features using adaptive boosting method

© 2025 by the Publisher. Licensee AccScience Publishing, USA. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC BY-NC 4.0) ( https://creativecommons.org/licenses/by-nc/4.0/ )

Download PDF

Cite

XML

HTML

Abstract

Lung cancer is a leading cause of cancer-related mortality worldwide, and accurate detection of epidermal growth factor receptor mutations is essential for personalized treatment. However, non-invasive identification of these mutations remains challenging due to the complexity of clinical and morphological patterns. This study develops an adaptive boosting (AdaBoost)-based machine learning model for detecting lung cancer mutations using clinical and morphological data. The dataset consists of clinical and morphological attributes from 80 patients, which processed through comprehensive preprocessing steps, including imputation, outlier removal, and feature selection. One-hot encoding increased the feature count beyond the original 28, and analysis of variance was employed to retain the most relevant 33 features. AdaBoost was trained with optimized hyperparameters, including learning rate and the number of estimators, which were tuned using grid search to ensure robustness. The model’s performance was evaluated using an 80/20 train-test split and k-fold cross-validation to assess generalization capability. Experimental results demonstrated that AdaBoost outperformed other models, achieving an accuracy of 83% and an area under the curve of 0.90 after feature selection. The model maintained superior cross-validation scores compared to Naive Bayes, decision tree, K-nearest neighbors, and support vector machine, reinforcing its reliability in mutation detection. The study highlights the significance of preprocessing steps in improving classification performance and suggests that AdaBoost can serve as an effective, non-invasive tool for assisting clinical decision-making in lung cancer mutation detection.

Keywords

Adaptive Boosting

Analysis of Variance

Lung Cancer

Machine Learning

Mutation

References

Benhar, H., Idri, A., & Fernández-Alemán, J.L. (2020). Data preprocessing for heart disease classification: A systematic literature review. Computer Methods and Programs in Biomedicine, 195, 105635. https://doi.org/10.1016/j.cmpb.2020.105635

Berger, A., & Kiefer, M. (2021). Comparison of different response time outlier exclusion methods: A simulation study. Frontiers in Psychology, 12, 675558. https://doi.org/10.3389/fpsyg.2021.675558

Bushara, A.R., Vinod Kumar, R.S., & Kumar, S.S. (2023). An ensemble method for the detection and classification of lung cancer using computed tomography images utilizing a capsule network with visual geometry group. Biomedical Signal Processing and Control, 85, 104930. https://doi.org/10.1016/j.bspc.2023.104930

Gautam, N., Basu, A., & Sarkar, R. (2024). Lung cancer detection from thoracic CT scans using an ensemble of deep learning models. Neural Computing and Applications, 36(5), 2459–2477. https://doi.org/10.1007/s00521-023-09130-7

González, S., García, S., Del Ser, J., Rokach, L., & Herrera, F. (2020). A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities. Information Fusion, 64, 205–237. https://doi.org/10.1016/j.inffus.2020.07.007

Jain, R., Singh, P., Abdelkader, M., & Boulila, W. (2024). Efficient lung cancer detection using computational intelligence and ensemble learning. PLOS ONE, 19(9), e0310882. https://doi.org/10.1371/journal.pone.0310882

Kanan, M., Alharbi, H., Alotaibi, N., Almasuood, L., Aljoaid, S., Alharbi, T., et al. (2024). AI-driven models for diagnosing and predicting outcomes in lung cancer: A systematic review and meta-analysis. Cancers (Basel), 16(3), 674. https://doi.org/10.3390/cancers16030674

Kwon, H.J., Park, U.H., Goh, C.J., Park, D., Lim, Y.G., Lee, I.K., et al. (2023). Enhancing lung cancer classification through integration of liquid biopsy multi-omics data with machine learning techniques. Cancers (Basel), 15(18), 4556. https://doi.org/10.3390/cancers15184556

Le, N.Q.K., Kha, Q.H., Nguyen, V.H., Chen, Y.C., Cheng, S.J., & Chen, C.Y. (2021). Machine learning-based radiomics signatures for EGFR and KRAS mutations prediction in non-small-cell lung cancer. International Journal of Molecular Sciences, 22(17), 9254. https://doi.org/10.3390/ijms22179254

Li, X. (2023). Lung cancer risk prediction and feature importance analysis with machine learning algorithm. Applied and Computational Engineering, 19, 205–210. https://doi.org/10.54254/2755-2721/19/20231034

Maurya, S.P., Sisodia, P.S., Mishra, R., & Singh, D.P. (2024). Performance of machine learning algorithms for lung cancer prediction: A comparative approach. Scientific Reports, 14(1), 18562. https://doi.org/10.1038/s41598-024-58345-8

Rakesh, M., & Baskar, R. (2024). A support vector machine for lung cancer detection with classification and compared with KNN for better accuracy. AIP Conference Proceedings, 2853(1), 020067. https://doi.org/10.1063/5.0198176

Rincy, T.N., & Gupta, R. (2020). Ensemble Learning Techniques and its Efficiency in Machine Learning: A Survey. 2nd International Conference on Data, Engineering and Applications (IDEA). p1–6. https://doi.org/10.1109/IDEA49133.2020.9170675

Sachdeva, R.K., Bathla, P., Rani, P., Lamba, R., Ghantasala, G.S.P., & Nassar, I.F. (2024). A novel K-nearest neighbor classifier for lung cancer disease diagnosis. Neural Computing and Applications. 36, 22403-22416. https://doi.org/10.1007/s00521-024-10235-w

Wang, S., Shi, J., Ye, Z., Dong, D., Yu, D., Zhou, M., et al. (2019). Predicting EGFR mutation status in lung adenocarcinoma on computed tomography image using deep learning. European Respiratory Journal. 53, 1800986. https://doi.org/10.1183/13993003.00986-2018

Yu, L., Tao, G., Zhu, L., Wang, G., Li, Z., Ye, J., et al. (2019). Prediction of pathologic stage in non-small cell lung cancer using machine learning algorithm based on CT image feature analysis. BMC Cancer, 19(1), 464. https://doi.org/10.1186/s12885-019-5646-9

Zhou, Z.H. (2012). Ensemble Methods: Foundations and Algorithms. 1st ed. Chapman and Hall/CRC, Boca Raton.

Previous article in this issue

Next article in this issue

International Journal of Systematic Innovation, Electronic ISSN: 2077-8767 Print ISSN: 2077-7973, Published by AccScience Publishing

Publisher's Core Philosophy

We are committed to support the scientific community by publishing impactful research and enhancing communication among scientists. At AccScience Publishing, we are continuously looking for ways to accelerate scientific progress and to strive for transparency and open communication, making knowledge freely accessible without barrier.

9 Raffles Place, Republic Plaza 1 #06-00 Singapore 048619

+65 8182 1586

editorial@accscience.com