AccScience Publishing / IJOSI / Volume 9 / Issue 4 / DOI: 10.6977/IJoSI.202508_9(4).0003
Cite this article
1
Download
1
Citations
7
Views
Journal Browser
Volume | Year
Issue
Search
News and Announcements
View All
ARTICLE

Detection of lung cancer mutation based on clinical and morphological features using adaptive boosting method

Lailil Muflikhah1* Amira G. Nurfansepta1 Edy Santoso1 Agus Wahyu Widodo1
Show Less
1 Department of Informatics Engineering, Faculty of Computer Science, Brawijaya University, Malang, East Java, Indonesia
Submitted: 20 November 2024 | Revised: 19 April 2025 | Accepted: 17 June 2025 | Published: 14 August 2025
© 2025 by the Publisher. Licensee AccScience Publishing, USA. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC BY-NC 4.0) ( https://creativecommons.org/licenses/by-nc/4.0/ )
Abstract

Lung cancer is a leading cause of cancer-related mortality worldwide, and accurate detection of epidermal growth factor receptor mutations is essential for personalized treatment. However, non-invasive identification of these mutations remains challenging due to the complexity of clinical and morphological patterns. This study develops an adaptive boosting (AdaBoost)-based machine learning model for detecting lung cancer mutations using clinical and morphological data. The dataset consists of clinical and morphological attributes from 80 patients, which processed through comprehensive preprocessing steps, including imputation, outlier removal, and feature selection. One-hot encoding increased the feature count beyond the original 28, and analysis of variance was employed to retain the most relevant 33 features. AdaBoost was trained with optimized hyperparameters, including learning rate and the number of estimators, which were tuned using grid search to ensure robustness. The model’s performance was evaluated using an 80/20 train-test split and k-fold cross-validation to assess generalization capability. Experimental results demonstrated that AdaBoost outperformed other models, achieving an accuracy of 83% and an area under the curve of 0.90 after feature selection. The model maintained superior cross-validation scores compared to Naive Bayes, decision tree, K-nearest neighbors, and support vector machine, reinforcing its reliability in mutation detection. The study highlights the significance of preprocessing steps in improving classification performance and suggests that AdaBoost can serve as an effective, non-invasive tool for assisting clinical decision-making in lung cancer mutation detection.

Keywords
Adaptive Boosting
Analysis of Variance
Lung Cancer
Machine Learning
Mutation
References

Benhar, H., Idri, A., & Fernández-Alemán, J.L. (2020). Data preprocessing for heart disease classification: A systematic literature review. Computer Methods and Programs in Biomedicine, 195, 105635. https://doi.org/10.1016/j.cmpb.2020.105635

 

Berger, A., & Kiefer, M. (2021). Comparison of different response time outlier exclusion methods: A simulation study. Frontiers in Psychology, 12, 675558. https://doi.org/10.3389/fpsyg.2021.675558

 

Bushara, A.R., Vinod Kumar, R.S., & Kumar, S.S. (2023). An ensemble method for the detection and classification of lung cancer using computed tomography images utilizing a capsule network with visual geometry group. Biomedical Signal Processing and Control, 85, 104930. https://doi.org/10.1016/j.bspc.2023.104930

 

Gautam, N., Basu, A., & Sarkar, R. (2024). Lung cancer detection from thoracic CT scans using an ensemble of deep learning models. Neural Computing and Applications, 36(5), 2459–2477. https://doi.org/10.1007/s00521-023-09130-7

 

González, S., García, S., Del Ser, J., Rokach, L., & Herrera, F. (2020). A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities. Information Fusion, 64, 205–237. https://doi.org/10.1016/j.inffus.2020.07.007

 

Jain, R., Singh, P., Abdelkader, M., & Boulila, W. (2024). Efficient lung cancer detection using computational intelligence and ensemble learning. PLOS ONE, 19(9), e0310882. https://doi.org/10.1371/journal.pone.0310882

 

Kanan, M., Alharbi, H., Alotaibi, N., Almasuood, L., Aljoaid, S., Alharbi, T., et al. (2024). AI-driven models for diagnosing and predicting outcomes in lung cancer: A systematic review and meta-analysis. Cancers (Basel), 16(3), 674. https://doi.org/10.3390/cancers16030674

 

Kwon, H.J., Park, U.H., Goh, C.J., Park, D., Lim, Y.G., Lee, I.K., et al. (2023). Enhancing lung cancer classification through integration of liquid biopsy multi-omics data with machine learning techniques. Cancers (Basel), 15(18), 4556. https://doi.org/10.3390/cancers15184556

 

Le, N.Q.K., Kha, Q.H., Nguyen, V.H., Chen, Y.C., Cheng, S.J., & Chen, C.Y. (2021). Machine learning-based radiomics signatures for EGFR and KRAS mutations prediction in non-small-cell lung cancer. International Journal of Molecular Sciences, 22(17), 9254. https://doi.org/10.3390/ijms22179254

 

Li, X. (2023). Lung cancer risk prediction and feature importance analysis with machine learning algorithm. Applied and Computational Engineering, 19, 205–210. https://doi.org/10.54254/2755-2721/19/20231034

 

Maurya, S.P., Sisodia, P.S., Mishra, R., & Singh, D.P. (2024). Performance of machine learning algorithms for lung cancer prediction: A comparative approach. Scientific Reports, 14(1), 18562. https://doi.org/10.1038/s41598-024-58345-8

 

Rakesh, M., & Baskar, R. (2024). A support vector machine for lung cancer detection with classification and compared with KNN for better accuracy. AIP Conference Proceedings, 2853(1), 020067. https://doi.org/10.1063/5.0198176

 

Rincy, T.N., & Gupta, R. (2020). Ensemble Learning Techniques and its Efficiency in Machine Learning: A Survey. 2nd International Conference on Data, Engineering and Applications (IDEA). p1–6. https://doi.org/10.1109/IDEA49133.2020.9170675

 

Sachdeva, R.K., Bathla, P., Rani, P., Lamba, R., Ghantasala, G.S.P., & Nassar, I.F. (2024). A novel K-nearest neighbor classifier for lung cancer disease diagnosis. Neural Computing and Applications. 36, 22403-22416. https://doi.org/10.1007/s00521-024-10235-w

 

Wang, S., Shi, J., Ye, Z., Dong, D., Yu, D., Zhou, M., et al. (2019). Predicting EGFR mutation status in lung adenocarcinoma on computed tomography image using deep learning. European Respiratory Journal. 53, 1800986. https://doi.org/10.1183/13993003.00986-2018

 

Yu, L., Tao, G., Zhu, L., Wang, G., Li, Z., Ye, J., et al. (2019). Prediction of pathologic stage in non-small cell lung cancer using machine learning algorithm based on CT image feature analysis. BMC Cancer, 19(1), 464. https://doi.org/10.1186/s12885-019-5646-9

 

Zhou, Z.H. (2012). Ensemble Methods: Foundations and Algorithms. 1st ed. Chapman and Hall/CRC, Boca Raton.

Share
Back to top
International Journal of Systematic Innovation, Electronic ISSN: 2077-8767 Print ISSN: 2077-7973, Published by AccScience Publishing