Evaluation of successive CPUs/APUs/GPUs based on an OpenCL finite difference stencil

被引：4

作者：

Calandra, Henri ^{[2
]}

Dolbeau, Romain ^{[3
]}

Fortin, Pierre ^{[1
]}

Lamotte, Jean-Luc ^{[1
]}

Said, Issam ^{[1
]}

机构：

[1] Univ Paris 06, UPMC, CNRS, LIP6,UMR7606, 4 Pl Jussieu, F-75252 Paris 05, France

[2] Total, F-64000 Pau, France

[3] CAPS Entreprise, F-35000 Rennes, France

来源：

PROCEEDINGS OF THE 2013 21ST EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING | 2013年

关键词：

APU; GPU; finite difference stencil; PCI Express bus; high performance scientific computing;

D O I：

10.1109/PDP.2013.65

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The AMD APU (Accelerated Processing Unit) architecture, which combines CPU and GPU cores on the same die, is promising for GPU applications which performance is bottlenecked by the low PCI Express communication rate. However the first APU generations still have different CPU and GPU memory partitions. Currently, the APU integrated GPUs are also less powerful than discrete GPUs. In this paper we therefore investigate the interest of APUs for scientific computing by evaluating and comparing the performance of two successive AMD APUs (family codename Llano and Trinity), two successive discrete GPUs (chip codename Cayman and Tahiti) and one hexa-core AMD CPU. For this purpose, we rely on a 3D finite difference stencil, that is optimized and tuned in OpenCL. We detail the most interesting optimizations for each architecture and show very good performance in OpenCL: up to 500 Gflops on Tahiti. Finally, our results show that APU integrated GPUs outperform CPUs, and that integrated GPUs of upcoming APUs may match discrete GPUs for problems with high communication requirements.

引用

页码：405 / 409

页数：5

共 25 条

[1] Fractal Video Compression in OpenCL: An Evaluation of CPUs, GPUs, and FPGAs as Acceleration Platforms
Chen, Doris
Singh, Deshanand
[J]. 2013 18TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2013, : 297 - 304
[2] Evaluation of Distributed Tasks in Stencil-based Application on GPUs
Raut, Eric
Anderson, Jonathon
Araya-Polo, Mauricio
Meng, Jie
[J]. PROCEEDINGS OF SIXTH INTERNATIONAL IEEE WORKSHOP ON EXTREME SCALE PROGRAMMING MODELS AND MIDDLEWARE (ESPM2 2021), 2021, : 45 - 52
[3] Source wavefield reconstruction based on a new finite-difference stencil and infinity norm
Bao, Qianzong
Dai, Xue
Liang, Xue
[J]. Shiyou Diqiu Wuli Kantan/Oil Geophysical Prospecting, 2022, 57 (06): : 1384 - 1394
[4] Octant-Based Stencil Selection for Meshless Finite Difference Methods in 3D
Davydov, Oleg
Dang Thi Oanh
Tuong Manh Ngo
[J]. VIETNAM JOURNAL OF MATHEMATICS, 2020, 48 (01) : 93 - 106
[5] Octant-Based Stencil Selection for Meshless Finite Difference Methods in 3D
Oleg Davydov
Dang Thi Oanh
Ngo Manh Tuong
[J]. Vietnam Journal of Mathematics, 2020, 48 : 93 - 106
[6] Optimized finite-difference time-domain methods based on the (2,4) stencil
Sun, GL
Trueman, CW
[J]. IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, 2005, 53 (03) : 832 - 842
[7] Performance evaluation of a 3D multi-view-based particle filter for visual object tracking using GPUs and multicore CPUs
David Concha
Raúl Cabido
Juan José Pantrigo
Antonio S. Montemayor
[J]. Journal of Real-Time Image Processing, 2018, 15 : 309 - 327
[8] Performance evaluation of a 3D multi-view-based particle filter for visual object tracking using GPUs and multicore CPUs
Concha, David
Cabido, Raul
Jose Pantrigo, Juan
Montemayor, Antonio S.
[J]. JOURNAL OF REAL-TIME IMAGE PROCESSING, 2018, 15 (02) : 309 - 327
[9] Performance Evaluation of the Three-Dimensional Finite-Difference Time-Domain(FDTD) Method on Fermi Architecture GPUs
Hou, Kaixi
Zhao, Ying
Huang, Jiumei
Zhang, Lingjie
[J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PT I: ICA3PP 2011, 2011, 7916 : 460 - 469
[10] Accelerating simulations of light scattering based on Finite-Difference Time-Domain method with general purpose GPUs
Balevic, A.
Rockstroh, L.
Tausendfreund, A.
Patzelt, S.
Goch, G.
Simon, S.
[J]. CSE 2008:11TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING, PROCEEDINGS, 2008, : 327 - +

← 1 2 3 →