A graphics processing unit-based parallel simplified swarm optimization algorithm for enhanced performance and precision
Graphics processing units (GPUs) have emerged as powerful platforms for parallel computing, enabling personal computers to solve complex optimization tasks effectively. Although swarm intelligence algorithms naturally lend themselves to parallelization, a GPU-based implementation of the simplified swarm optimization (SSO) algorithm has not been reported in the literature. This paper introduces a compute CUDA-SSO algorithm on the CUDA platform, with a time complexity analysis of O (Ngen × Nsol × Nvar), where Ngen is the number of iterations, Nsol is the population size (i.e., number of fitness function evaluations), and Nvar represents the required pairwise comparisons. By eliminating resource preemption of personal best and global best updates, CUDA-SSO significantly reduces the overall complexity and prevents concurrency conflicts. Numerical experiments demonstrate that the proposed approach achieves an order-of-magnitude improvement in run time with superior solution precision relative to central processing unit-based SSO, making it a compelling methodology for large-scale, data-parallel optimization tasks.
Abbasi, M., Rafiee, M., Khosravi, M.R., Jolfaei, A., Menon, V.G., & Koushyar, J. M. (2020). An efficient parallel genetic algorithm solution for vehicle routing problem in cloud implementation of the intelligent transportation systems. Journal of Cloud Computing, 9(6), 6. https://doi.org/10.1186/s13677-020-0157-4
AlZubi, S., Shehab, M., Al-Ayyoub, M., Jararweh, Y., & Gupta, B. (2020). Parallel implementation for 3D medical volume fuzzy segmentation. Pattern Recognition Letters, 130, 312–318. https://doi.org/10.1016/j.patrec.2018.07.026
Chung, Y.Y., & Wahid, N. (2012). A hybrid network intrusion detection system using simplified swarm optimization (SSO). Applied Soft Computing, 12(9), 3014–3022. https://doi.org/10.1016/j.asoc.2012.04.020
Corley, H.W., Rosenberger, J., Yeh, W.C., & Sung, T.K. (2006). The cosine simplex algorithm. The International Journal of Advanced Manufacturing Technology, 27, 1047–1050. https://doi.org/10.1007/s00170-004-2278-1
Corral, J.M.R., Civit-Masot, J., Luna-Perejón, F., Díaz-Cano, I., Morgado-Estévez, A., & Domínguez-Morales, M. (2024). Energy efficiency in edge TPU vs. Embedded GPU for computer-aided medical imaging segmentation and classification. Engineering Applications of Artificial Intelligence, 127, 107298. https://doi.org/10.1016/j.engappai.2023.107298
Friedman, J.H. (1994). An overview of predictive learning and function approximation. In: From Statistics to Neural Networks. Vol. 136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-79119-2_1
Gordon, V.S., & Whitley, D. (1993). Serial and parallel genetic algorithms as function optimizers. In: Proceedings of the 5th International Conference on Genetic Algorithms (ICGA). Urbana-Champaign, IL, USA.
Hachaj, T., & Piekarczyk, M. (2023). High-level hessian-based image processing with the frangi neuron. Electronics, 12(19), 4159. https://doi.org/10.3390/electronics12194159
Hadley, G. (1964). A Nonlinear and Dynamics Programming. Addison-Wesley Professional, Reading, MA, USA.
Hager, G., Zeiser, T., & Wellein, G. (2008). Data Access Optimizations for Highly Threaded Multi-Core Cpus with Multiple Memory Controllers. In: Proceedings of 2008 IEEE International Symposium on Parallel and Distributed Processing, p1–7. https://doi.org/10.1109/IPDPS.2008.4536341
Hussain, M.M., Hattori, H., & Fujimoto, N. (2016). A CUDA implementation of the standard particle swarm optimization. In: Proceedings of 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) IEEE, Timisoara, Romania. p24–27. https://doi.org/10.1109/SYNASC.2016.043
Krömer, P., Platoš, J., Snášel, V., & Abraham, A. (2011). A comparison of many-threaded differential evolution and genetic algorithms on CUDA. In: Proceedings of 2011 Third World Congress on Nature and Biologically Inspired Computing: IEEE, Salamanca, Spain, p19–21. https://doi.org/10.1109/NaBIC.2011.6089641
Lee, W.C., Chuang, M.C., & Yeh, W.C. (2012). Uniform parallel-machine scheduling to minimize makespan with position-based learning curves. Computers and Industrial Engineering, 63(4), 813–818. https://doi.org/10.1016/j.cie.2012.05.003
Li, W., & Zhang, Z. (2011). A Cuda-Based Multi-Channel Particle Swarm Algorithm. In: Proceedings of 2011 International Conference on Control, Automation and Systems Engineering (CASE): IEEE, Singapore. https://doi.org/10.1109/ICCASE.2011.5997786
Luo, C., Sun, B., Yang, K., Lu, T., & Yeh, W.C. (2019). Thermal infrared and visible sequences fusion tracking based on a hybrid tracking framework with adaptive weighting scheme. Infrared Physics and Technology, 99, 265–276. https://doi.org/10.1016/j.infrared.2019.04.017
Mittal, S., & Vetter, J.S. (2014). A survey of methods for analyzing and improving GPU energy efficiency. ACM Computing Surveys (CSUR), 47(2), 1–23. https://doi.org/10.1145/263634
Mortezazadeh, M., Wang, L.L., Albettar, M., & Yang, S. (2022). CityFED - city fast fluid dynamics for Urban microclimate simulations on graphics processing units. Urban Climate, 41, 101063. https://doi.org/10.1016/j.uclim.2021.101063
Mussi, L., Daolio, F., & Cagnoni, S. (2011). Evaluation of parallel particle swarm optimization algorithms within the CUDA™ architecture. Information Sciences, 181(20), 4642–4657. https://doi.org/10.1016/j.ins.2010.08.045
Navarro, C.A., Hitschfeld-Kahler, N., & Mateu, L. (2014). A survey on parallel computing and its applications in data-parallel problems using GPU architectures. Communications in Computational Physics, 15(2), 285–329. https://doi.org/10.4208/cicp.110113.010813a
NVIDIA Corporation. (2012). CUDA C Best Practices Guide. Ver. 4.1. [Technical Report]. United States: NVIDIA Corporation.
NVIDIA. (n.d.). CUDA C Programming Guide. NVIDIA Documentation. Available from: https://docs.nvidia.com/cuda/cuda-c-programming-guide [Last accessed on 2024 Nov 01].
Pllana, S., & Xhafa, F. (2017). Programming Multicore and Many-Core Computing Systems. John Wiley and Sons, United States. https://doi.org/10.1002/9781119332015
Ruder, S. (2016). An Overview of Gradient Descent Optimization Algorithms. [arXiv Preprint].
Tan, Y., & Ding, K. (2015). A survey on GPU-based implementation of swarm intelligence algorithms. IEEE Transactions on Cybernetics, 46(9), 2028–2041. https://doi.org/10.1109/TCYB.2015.2460261
Tsutsui, S., & Fujimoto, N. (2009). Solving quadratic assignment problems by genetic algorithms with GPU computation: A case study. In: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers. Montreal, Québec, Canada, 2009. p8–12. https://doi.org/10.1145/1570256.157035
Wolpert, D.H., & Macready, W.G. (1995). No Free Lunch Theorems for Search. Technical Report SFI-TR-95-02-010. Santa Fe Institute, United States.
Wong, M.L. (2009). Parallel multi-objective evolutionary algorithms on graphics processing units. In: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers. Montreal, QC, Canada. https://doi.org/10.1145/1570256.1570354
Yeh, W.C. (2009). A two-stage discrete particle swarmoptimization for the problem of multiple multi-level redundancy allocation in series systems. Expert Systems with Applications, 36(5), 9192–9200. https://doi.org/10.1016/j.eswa.2008.12.024
Yeh, W.C. (2012). Simplified swarm optimization in disassembly sequencing problems with learning effects. Computers and Operations Research, 39(9), 2168–2177. https://doi.org/10.1016/j.cor.2011.10.027
Yeh, W.C. (2013). New parameter-free simplified swarm optimization for artificial neural network training and its application in the prediction of time series. IEEE Transactions on Neural Networks and Learning Systems, 24(4), 661–665. https://doi.org/10.1109/TNNLS.2012.2232678
Yeh, W.C. (2014). Orthogonal simplified swarm optimization for the series-parallel redundancy allocation problem with a mix of components. Knowledge Based Systems, 64, 1–12. https://doi.org/10.1016/j.knosys.2014.03.011
Yeh, W.C. (2015). An improved simplified swarm optimization. Knowledge Based Systems, 82, 60–69. https://doi.org/10.1016/j.knosys.2015.02.022
Yeh, W.C. (2017). A new exact solution algorithm for a novel generalized redundancy allocation problem. Information Sciences, 408, 182–197. https://doi.org/10.1016/j.ins.2017.04.019
Yeh, W.C., & Wei, S.C. (2012). Economic-based resource allocation for reliable grid-computing service based on Grid Bank. Future Generation Computer Systems, 28(7), 989–1002. https://doi.org/10.1016/j.future.2012.03.005
Yildirim, E., Arslan, E., Kim, J., & Kosar, T. (2015). Application-level optimization of big data transfers through pipelining, parallelism and concurrency. IEEE Transactions on Cloud Computing, 4(1), 63–75. https://doi.org/10.1109/TCC.2015.2415804
Zhu, H., Guo, Y., Wu, J., Gu, J., & Eguchi, K. (2011). Paralleling euclidean particle swarm optimization in CUDA. In: Proceedings of 2011 4th International Conference on Intelligent Networks and Intelligent Systems: IEEE, Kuming, China. https://doi.org/10.1109/ICINIS.2011.35
Zhu, W. (2011). Massively parallel differential evolution-pattern search optimization with graphics hardware acceleration: An investigation on bound constrained optimization problems. Journal of Global Optimization, 50(3), 417–437. https://doi.org/10.1007/s10898-010-9590-0
