Principal component analysis-enhanced ensemble learning models for proactive failure prediction in cloud-based systems

¹ Department of Computer Science and Engineering (Artificial Intelligence and Machine Learning), Pragati Engineering College (Autonomous), Andhra Pradesh, India

² Department of Computer Science and Engineering (Artificial Intelligence and Machine Learning), Godavari Global University, Rajamahendravaram, Andhra Pradesh, India

IJOSI 2026, 10(2), 025430055 https://doi.org/10.6977/IJoSI.202604_10(2).0001

Received: 25 October 2025 | Revised: 6 December 2025 | Accepted: 7 February 2026 | Published online: 30 April 2026

© 2026 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC-by the license) ( https://creativecommons.org/licenses/by-nc/4.0/ )

Download PDF

XML

Cite

Abstract

Cloud computing environments require high availability and scalability, making proactive failure management essential for ensuring system reliability, security, and consistent performance. Effective failure prediction significantly reduces downtime, improves disaster recovery processes, and maintains uninterrupted service delivery. This paper presents an optimized machine learning framework for predicting failures in cloud infrastructures by integrating principal component analysis (PCA) with advanced ensemble learning models. The study employs three prominent models—random forest (RF), categorical boosting (CatBoost), and light gradient boosting machine (LightGBM)—enhanced through PCA to improve feature representation and overall predictive accuracy. Key operational metrics, including class scheduling, memory usage, central processing unit utilization, event instances, and task priority, are used as features. The Google 2019 cluster dataset is utilized, and preprocessing steps involve handling missing data, scaling numerical attributes, and encoding categorical variables to ensure data quality. Experimental results reveal that PCA-enhanced RF, CatBoost, and LightGBM achieve superior accuracies of 94.31%, 97.17%, and 98.36%, respectively, outperforming their standard counterparts. These outcomes highlight the effectiveness of PCA-integrated ensemble learning and underscore its potential for real-time cloud failure prediction and automated fault monitoring in large-scale distributed environments.

Keywords

Cloud-based systems

Failure prediction

Random forest; CatBoost

Light gradient boosting machine

Principal component analysis

Likelihood of failure

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

References

Al Essa, H. A., & Bhay, W. S. (2023). Ensemble learning classifiers hybrid feature selection for enhancing performance of intrusion detection system. Bulletin of Electrical Engineering and Informatics, 13(1), 665–676. https://doi.org/10.11591/eei.v13i1.5844

Chen, Y., & Zhang, R. (2025). Hybrid dual-channel attention CNN and eXtreme Gradient Boosting for industrial process model development and fault diagnosis. IEEE Internet of Things Journal, 12(17), 35649–35661. https://doi.org/10.1109/JIOT.2025.3579006

Deb, K., Zhang, X., & Duh, K. (2022). Post-hoc interpretation of transformer hyperparameters with explainable boosting machines. In J. Bastings, Y. Belinkov, Y. Elazar, D. Hupkes, N. Saphra, & S. Wiegreffe (Eds.), Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (pp. 51–61). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.blackboxnlp-1.5

Dugyala, R., Kumar, T. N., Umamaheshwar, E., & Vijendar, G. (2023). An ensemble learning approach for task failure prediction in cloud data centers. In S. K. Tummala, S. Kosaraju, P. B. Bobba, & S. K. Singh (Eds.), E3S Web of Conferences, 391, 01072. EDP Sciences. https://doi.org/10.1051/e3sconf/202339101072

Gao, J., Wang, H., & Shen, H. (2020). Task failure prediction in cloud data centers using deep learning. IEEE Transactions on Services Computing, 15(3), 1411–1422.

Giridhar, M. V., Shetty, C. S., Kanthi, N., & Jayanthi, P. N. (2025). Artificial intelligence-based fault prediction for cloudresource efficiency. Journal of Emerging Technologies and Innovative Research, 12(2), g543–g546. https://www.jetir. org/view?paper=JETIR2502662

Gollapalli, M., AlMetrik, M. A., AlNajrani, B. S., AlOmari, A. A., AlDawoud, S. H., AlMunsour, Y. Z., Abdulqader, M. M., & Aloup, K. M. (2022). Task failure prediction using machine learning techniques in the Google cluster trace cloud computing environment. Mathematical Modelling of Engineering Problems, 9(2), 545–553. https://doi.org/10.18280/mmep.090234

Hadadi, F., Dawes, J. H., Shin, D., Bianculli, D., & Briand, L. (2024). Systematic evaluation of deep learning models for log-based failure prediction. Empirical Software Engineering, 29(5), 105. https://doi.org/10.1007/s10664-024-10501-4

Hamaide, V., Joassin, D., Castin, L., & Glineur, F. (2022). A two-level machine learning framework for predictive maintenance: Comparison of learning formulations. arXiv. https://arxiv. org/abs/2204.10083

Jardine, A. K. S., Lin, D., & Banjevic, D. (2006). A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mechanical Systems and Signal Processing, 20(7), 1483–1510. https://doi.org/10.1016/j.ymssp.2005.09.012

Jassas, M. S., Mahmoud, S. M., Alrashoud, M., & Alqahtani, A. (2022). Analysis of job failure and prediction model for cloud computing using machine learning. Sensors, 22(5), 2035. https://doi.org/10.3390/s22052035

Li, X., Wu, X., Wang, T., Xie, Y., & Chu, F. (2025). Fault diagnosis method for imbalanced data based on adaptive diffusion models and generative adversarial networks. Engineering Applications of Artificial Intelligence, 147, 110410. https://doi.org/10.1016/j.engappai.2025.110410

Malhi, A., & Gao, R. X. (2004). PCA-based feature selection scheme for machine defect classification. IEEE Transactions on Instrumentation and Measurement, 53(6), 1517–1525. https://doi.org/10.1109/TIM.2004.834070

Nori, H., Jenkins, S., Koch, P., & Caruana, R. (2019). InterpretML: A unified framework for machine learning interpretability. arXiv. https://arxiv.org/abs/1909.09223

Pruckovskaja, V., Weissenfeld, A., Heistracher, C., Graser, A., Kafka, J., Leputsch, P., Schall, D., & Kemnitz, J. (2023). Federated learning for predictive maintenance and quality inspection in industrial applications. arXiv. https://arxiv.org/abs/2304.11101

Saxena, D., & Singh, A. K. (2022). OFP-TM: An online VM failure prediction and tolerance model towards high availability of cloud computing environments. The Journal of Supercomputing, 78(6), 8003–8024. https://doi.org/10.1007/s11227-021-04235-z

Vago, N. O. P., Forbicini, F., & Fraternali, P. (2024). Predicting machine failures from multivariate time series: An industrial case study. Machines, 12(6), 357. https://doi.org/10.3390/machines12060357

Wen, Y., Rahman, M. F., Xu, H., & Tseng, T.-L. B. (2022). Recent advances and trends of predictive maintenance from data-driven machine prognostics perspective. Measurement, 187, 110276. https://doi.org/10.1016/j.measurement.2021.110276

Xie, Y., Lian, K., Liu, Q., Zhang, C., & Liu, H. (2021). Digital twin for cutting tool: Modeling, application and service strategy. Journal of Manufacturing Systems, 58, 305–312.

Yang, H., & Kim, Y. (2022). Design and implementation of machine learning-based fault prediction system in cloud infrastructure. Electronics, 11(22), 3765. https://doi.org/10.3390/electronics11223765

Zhang, Q., Liu, Q., & Ye, Q. (2024). An attention-based temporal convolutional network method for predicting remaining useful life of aero-engine. Engineering Applications of Artificial Intelligence, 127(A), 107241. https://doi.org/10.1016/j.engappai.2023.107241

Zhao, R., Yan, R., Chen, Z., Mao, K., Wang, P., & Gao, R. X. (2019). Deep learning and its applications to machine health monitoring. Mechanical Systems and Signal Processing, 115, 213–237. https://doi.org/10.1016/j.ymssp.2018.05.050

Previous article in this issue

Next article in this issue

International Journal of Systematic Innovation, Electronic ISSN: 2077-8767 Print ISSN: 2077-7973, Published by AccScience Publishing