A graphics processing unit-based parallel simplified swarm optimization algorithm for enhanced performance and precision

² Integration and Collaboration Laboratory, Department of Industrial Engineering and Engineering Management, College of Engineering, National Tsing Hua University, Hsinchu, Taiwan

³ Department of Industrial and Systems Engineering, College of Electrical Engineering and Computer Science, Chung Yuan Christian University, Taoyuan, Taiwan

⁴ School of Economics and Management, Taizhou University, Taizhou, Jiangsu, China

⁵ Department of International Logistics and Transportation Management, School of Transportation and Tourism, Kainan University, Taoyuan, Taiwan

IJOSI 2025 , 9(5), 23–42; https://doi.org/10.6977/IJoSI.202510_9(5).0003

Submitted: 13 January 2025 | Revised: 2 August 2025 | Accepted: 4 August 2025 | Published: 16 October 2025

© 2025 by the Publisher. Licensee AccScience Publishing, USA. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC BY-NC 4.0) ( https://creativecommons.org/licenses/by-nc/4.0/ )

Abstract

Graphics processing units (GPUs) have emerged as powerful platforms for parallel computing, enabling personal computers to solve complex optimization tasks effectively. Although swarm intelligence algorithms naturally lend themselves to parallelization, a GPU-based implementation of the simplified swarm optimization (SSO) algorithm has not been reported in the literature. This paper introduces a compute CUDA-SSO algorithm on the CUDA platform, with a time complexity analysis of O (Ngen × Nsol × Nvar), where Ngen is the number of iterations, Nsol is the population size (i.e., number of fitness function evaluations), and Nvar represents the required pairwise comparisons. By eliminating resource preemption of personal best and global best updates, CUDA-SSO significantly reduces the overall complexity and prevents concurrency conflicts. Numerical experiments demonstrate that the proposed approach achieves an order-of-magnitude improvement in run time with superior solution precision relative to central processing unit-based SSO, making it a compelling methodology for large-scale, data-parallel optimization tasks.

Keywords

Compute Unified Device Architecture

Graphics Processing Unit

Parallelism

Simplified Swarm Optimization

Swarm Intelligence Algorithms

Funding

This research was supported in part by the Natural Science Foundation of China (Grant No. 62106048) and the Ministry of Science and Technology (MOST), China, under grants MOST 107-2221-E-007-072-MY3, MOST 110-2221-E-007-107-MY3, and NSTC 113-2221-E-007-117-MY3.

References

Abbasi, M., Rafiee, M., Khosravi, M.R., Jolfaei, A., Menon, V.G., & Koushyar, J. M. (2020). An efficient parallel genetic algorithm solution for vehicle routing problem in cloud implementation of the intelligent transportation systems. Journal of Cloud Computing, 9(6), 6. https://doi.org/10.1186/s13677-020-0157-4

AlZubi, S., Shehab, M., Al-Ayyoub, M., Jararweh, Y., & Gupta, B. (2020). Parallel implementation for 3D medical volume fuzzy segmentation. Pattern Recognition Letters, 130, 312–318. https://doi.org/10.1016/j.patrec.2018.07.026

Chung, Y.Y., & Wahid, N. (2012). A hybrid network intrusion detection system using simplified swarm optimization (SSO). Applied Soft Computing, 12(9), 3014–3022. https://doi.org/10.1016/j.asoc.2012.04.020

Corley, H.W., Rosenberger, J., Yeh, W.C., & Sung, T.K. (2006). The cosine simplex algorithm. The International Journal of Advanced Manufacturing Technology, 27, 1047–1050. https://doi.org/10.1007/s00170-004-2278-1

Corral, J.M.R., Civit-Masot, J., Luna-Perejón, F., Díaz-Cano, I., Morgado-Estévez, A., & Domínguez-Morales, M. (2024). Energy efficiency in edge TPU vs. Embedded GPU for computer-aided medical imaging segmentation and classification. Engineering Applications of Artificial Intelligence, 127, 107298. https://doi.org/10.1016/j.engappai.2023.107298

Friedman, J.H. (1994). An overview of predictive learning and function approximation. In: From Statistics to Neural Networks. Vol. 136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-79119-2_1

Gordon, V.S., & Whitley, D. (1993). Serial and parallel genetic algorithms as function optimizers. In: Proceedings of the 5th International Conference on Genetic Algorithms (ICGA). Urbana-Champaign, IL, USA.

Hachaj, T., & Piekarczyk, M. (2023). High-level hessian-based image processing with the frangi neuron. Electronics, 12(19), 4159. https://doi.org/10.3390/electronics12194159

Hadley, G. (1964). A Nonlinear and Dynamics Programming. Addison-Wesley Professional, Reading, MA, USA.

Hager, G., Zeiser, T., & Wellein, G. (2008). Data Access Optimizations for Highly Threaded Multi-Core Cpus with Multiple Memory Controllers. In: Proceedings of 2008 IEEE International Symposium on Parallel and Distributed Processing, p1–7. https://doi.org/10.1109/IPDPS.2008.4536341

Hussain, M.M., Hattori, H., & Fujimoto, N. (2016). A CUDA implementation of the standard particle swarm optimization. In: Proceedings of 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) IEEE, Timisoara, Romania. p24–27. https://doi.org/10.1109/SYNASC.2016.043

Krömer, P., Platoš, J., Snášel, V., & Abraham, A. (2011). A comparison of many-threaded differential evolution and genetic algorithms on CUDA. In: Proceedings of 2011 Third World Congress on Nature and Biologically Inspired Computing: IEEE, Salamanca, Spain, p19–21. https://doi.org/10.1109/NaBIC.2011.6089641

Lee, W.C., Chuang, M.C., & Yeh, W.C. (2012). Uniform parallel-machine scheduling to minimize makespan with position-based learning curves. Computers and Industrial Engineering, 63(4), 813–818. https://doi.org/10.1016/j.cie.2012.05.003

Li, W., & Zhang, Z. (2011). A Cuda-Based Multi-Channel Particle Swarm Algorithm. In: Proceedings of 2011 International Conference on Control, Automation and Systems Engineering (CASE): IEEE, Singapore. https://doi.org/10.1109/ICCASE.2011.5997786

Luo, C., Sun, B., Yang, K., Lu, T., & Yeh, W.C. (2019). Thermal infrared and visible sequences fusion tracking based on a hybrid tracking framework with adaptive weighting scheme. Infrared Physics and Technology, 99, 265–276. https://doi.org/10.1016/j.infrared.2019.04.017

Mittal, S., & Vetter, J.S. (2014). A survey of methods for analyzing and improving GPU energy efficiency. ACM Computing Surveys (CSUR), 47(2), 1–23. https://doi.org/10.1145/263634

Mortezazadeh, M., Wang, L.L., Albettar, M., & Yang, S. (2022). CityFED - city fast fluid dynamics for Urban microclimate simulations on graphics processing units. Urban Climate, 41, 101063. https://doi.org/10.1016/j.uclim.2021.101063

Mussi, L., Daolio, F., & Cagnoni, S. (2011). Evaluation of parallel particle swarm optimization algorithms within the CUDA™ architecture. Information Sciences, 181(20), 4642–4657. https://doi.org/10.1016/j.ins.2010.08.045

Navarro, C.A., Hitschfeld-Kahler, N., & Mateu, L. (2014). A survey on parallel computing and its applications in data-parallel problems using GPU architectures. Communications in Computational Physics, 15(2), 285–329. https://doi.org/10.4208/cicp.110113.010813a

NVIDIA Corporation. (2012). CUDA C Best Practices Guide. Ver. 4.1. [Technical Report]. United States: NVIDIA Corporation.

NVIDIA. (n.d.). CUDA C Programming Guide. NVIDIA Documentation. Available from: https://docs.nvidia.com/cuda/cuda-c-programming-guide [Last accessed on 2024 Nov 01].

Pllana, S., & Xhafa, F. (2017). Programming Multicore and Many-Core Computing Systems. John Wiley and Sons, United States. https://doi.org/10.1002/9781119332015

Ruder, S. (2016). An Overview of Gradient Descent Optimization Algorithms. [arXiv Preprint].

Tan, Y., & Ding, K. (2015). A survey on GPU-based implementation of swarm intelligence algorithms. IEEE Transactions on Cybernetics, 46(9), 2028–2041. https://doi.org/10.1109/TCYB.2015.2460261

Tsutsui, S., & Fujimoto, N. (2009). Solving quadratic assignment problems by genetic algorithms with GPU computation: A case study. In: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers. Montreal, Québec, Canada, 2009. p8–12. https://doi.org/10.1145/1570256.157035

Wolpert, D.H., & Macready, W.G. (1995). No Free Lunch Theorems for Search. Technical Report SFI-TR-95-02-010. Santa Fe Institute, United States.

Wong, M.L. (2009). Parallel multi-objective evolutionary algorithms on graphics processing units. In: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers. Montreal, QC, Canada. https://doi.org/10.1145/1570256.1570354

Yeh, W.C. (2009). A two-stage discrete particle swarmoptimization for the problem of multiple multi-level redundancy allocation in series systems. Expert Systems with Applications, 36(5), 9192–9200. https://doi.org/10.1016/j.eswa.2008.12.024

Yeh, W.C. (2012). Simplified swarm optimization in disassembly sequencing problems with learning effects. Computers and Operations Research, 39(9), 2168–2177. https://doi.org/10.1016/j.cor.2011.10.027

Yeh, W.C. (2013). New parameter-free simplified swarm optimization for artificial neural network training and its application in the prediction of time series. IEEE Transactions on Neural Networks and Learning Systems, 24(4), 661–665. https://doi.org/10.1109/TNNLS.2012.2232678

Yeh, W.C. (2014). Orthogonal simplified swarm optimization for the series-parallel redundancy allocation problem with a mix of components. Knowledge Based Systems, 64, 1–12. https://doi.org/10.1016/j.knosys.2014.03.011

Yeh, W.C. (2015). An improved simplified swarm optimization. Knowledge Based Systems, 82, 60–69. https://doi.org/10.1016/j.knosys.2015.02.022

Yeh, W.C. (2017). A new exact solution algorithm for a novel generalized redundancy allocation problem. Information Sciences, 408, 182–197. https://doi.org/10.1016/j.ins.2017.04.019

Yeh, W.C., & Wei, S.C. (2012). Economic-based resource allocation for reliable grid-computing service based on Grid Bank. Future Generation Computer Systems, 28(7), 989–1002. https://doi.org/10.1016/j.future.2012.03.005

Yildirim, E., Arslan, E., Kim, J., & Kosar, T. (2015). Application-level optimization of big data transfers through pipelining, parallelism and concurrency. IEEE Transactions on Cloud Computing, 4(1), 63–75. https://doi.org/10.1109/TCC.2015.2415804

Zhu, H., Guo, Y., Wu, J., Gu, J., & Eguchi, K. (2011). Paralleling euclidean particle swarm optimization in CUDA. In: Proceedings of 2011 4th International Conference on Intelligent Networks and Intelligent Systems: IEEE, Kuming, China. https://doi.org/10.1109/ICINIS.2011.35

Zhu, W. (2011). Massively parallel differential evolution-pattern search optimization with graphics hardware acceleration: An investigation on bound constrained optimization problems. Journal of Global Optimization, 50(3), 417–437. https://doi.org/10.1007/s10898-010-9590-0

Conflict of interest

The authors declare that there are no conflicts of interest.

Previous article in this issue

Next article in this issue

International Journal of Systematic Innovation, Electronic ISSN: 2077-8767 Print ISSN: 2077-7973, Published by AccScience Publishing

Publisher's Core Philosophy

We are committed to support the scientific community by publishing impactful research and enhancing communication among scientists. At AccScience Publishing, we are continuously looking for ways to accelerate scientific progress and to strive for transparency and open communication, making knowledge freely accessible without barrier.

9 Raffles Place, Republic Plaza 1 #06-00 Singapore 048619

+65 8182 1586

editorial@accscience.com