A General Framework for Accelerating Swarm Intelligence Algorithms on FPGAs, GPUs and Multi-Core CPUs

被引:8
|
作者
Li, Dalin [1 ,2 ]
Huang, Lan [1 ]
Wang, Kangping [1 ]
Pang, Wei [3 ]
Zhou, You [1 ]
Zhang, Rui [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 13002, Jilin, Peoples R China
[2] Zhuhai Coll Jilin Univ, Dept Comp Sci & Technol, Zhuhai Lab Key Lab Symbol Computat & Knowledge En, Minist Educ, Zhuhai 519041, Peoples R China
[3] Univ Aberdeen, Dept Comp Sci, Aberdeen AB24 3UE, Scotland
来源
IEEE ACCESS | 2018年 / 6卷
基金
中国国家自然科学基金;
关键词
Field programmable gate arrays; multicore processing; parallel programming; particle swarm optimization; pipeline processing; OPTIMIZATION; ARCHITECTURE;
D O I
10.1109/ACCESS.2018.2882455
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Swarm intelligence algorithms (SIAs) have demonstrated excellent performance when solving optimization problems including many real-world problems. However, because of their expensive computational cost for some complex problems, SIAs need to be accelerated effectively for better performance. This paper presents a high-performance general framework to accelerate SIAs (FASI). Different from the previous work which accelerates SIAs through enhancing the parallelization only, FASI considers both the memory architectures of hardware platforms and the dataflow of SIAs, and it reschedules the framework of SIAs as a converged dataflow to improve the memory access efficiency. FASI achieves higher acceleration ability by matching the algorithm framework to the hardware architectures. We also design deep optimized structures of the parallelization and convergence of FASI based on the characteristics of specific hardware platforms. We take the quantum behaved particle swarm optimization algorithm as a case to evaluate FASI. The results show that FASI improves the throughput of SIAs and provides better performance through optimizing the hardware implementations. In our experiments, FASI achieves a maximum of 290.7 Mb/s throughput which is higher than several existing systems, and FASI on FPGAs achieves a better speedup than that on GPUs and multi-core CPUs. FASI is up to 123 times and not less than 1.45 times faster in terms of optimization time on Xilinx Kintex Ultrascale xcku040 when compares to Intel Core i7-6700 CPU/ NVIDIA GTX1080 GPU. Finally, we compare the differences of deploying FASI on hardware platforms and provide some guidelines for promoting the acceleration performance according to the hardware architectures.
引用
收藏
页码:72327 / 72344
页数:18
相关论文
共 50 条
  • [1] Population-Based MCMC on Multi-Core CPUs, GPUs and FPGAs
    Mingas, Grigorios
    Bouganis, Christos-Savvas
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (04) : 1283 - 1296
  • [2] Fast and Parallel Computation of the Discrete Periodic Radon Transform on GPUs, multi-core CPUs and FPGAs
    Carranza, Cesar
    Pattichis, Marios
    Llamocca, Daniel
    [J]. 2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 4158 - 4162
  • [3] PARALLEL SPN ON MULTI-CORE CPUS AND MANY-CORE GPUS
    Kirschenmann, W.
    Plagne, L.
    Poncot, A.
    Vialle, S.
    [J]. TRANSPORT THEORY AND STATISTICAL PHYSICS, 2010, 39 (2-4): : 255 - 281
  • [4] Scalable Multi-coloring Preconditioning for Multi-core CPUs and GPUs
    Heuveline, Vincent
    Lukarski, Dimitar
    Weiss, Jan-Philipp
    [J]. EURO-PAR 2010 PARALLEL PROCESSING WORKSHOPS, 2011, 6586 : 389 - 397
  • [5] Accelerating subset sum and lattice based public-key cryptosystems with multi-core CPUs and GPUs
    Al Badawi, Ahmad
    Veeravalli, Bharadwaj
    Aung, Khin Mi Mi
    Hamadicharef, Brahim
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 119 : 179 - 190
  • [6] Parallelization of Transition Counting for Process Mining on Multi-core CPUs and GPUs
    Ferreira, Diogo R.
    Santos, Rui M.
    [J]. BUSINESS PROCESS MANAGEMENT WORKSHOPS, BPM 2016, 2017, 281 : 36 - 48
  • [7] A framework for accelerating local feature extraction with OpenCL on multi-core CPUs and co-processors
    Moren, Konrad
    Goehringer, Diana
    [J]. JOURNAL OF REAL-TIME IMAGE PROCESSING, 2019, 16 (04) : 901 - 918
  • [8] A framework for accelerating local feature extraction with OpenCL on multi-core CPUs and co-processors
    Konrad Moren
    Diana Göhringer
    [J]. Journal of Real-Time Image Processing, 2019, 16 : 901 - 918
  • [9] Challenges and Opportunities of Obtaining Performance from Multi-Core CPUs and Many-Core GPUs
    Chen, Trista P.
    Chen, Yen-Kuang
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 613 - +
  • [10] Parallelization Strategies of the Canny Edge Detector for Multi-core CPUs and Many-core GPUs
    Ben Cheikh, Taieb Lamine
    Beltrame, Giovanni
    Nicolescu, Gabriela
    Cheriet, Farida
    Tahar, Sofiene
    [J]. 2012 IEEE 10TH INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS), 2012, : 49 - 52