Exploring data flow design and vectorization with oneAPI for streaming applications on CPU plus GPU

被引:0
|
作者
Campos, Cristian [1 ]
Asenjo, Rafael [1 ]
Navarro, Angeles [1 ]
机构
[1] Univ Malaga, Dept Comp Architecture, Malaga 29071, Malaga, Spain
来源
JOURNAL OF SUPERCOMPUTING | 2025年 / 81卷 / 02期
关键词
Streaming applications; Heterogeneous computing; Analytical model; Queue theory; CPU plus GPU; oneAPI; SYCL;
D O I
10.1007/s11227-024-06891-3
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In recent times, oneAPI has emerged as a competitive framework to optimize streaming applications on heterogeneous CPU+GPU architectures, since it provides portability and performance thanks to the SYCL programming language and efficient parallel libraries as oneTBB. However, this approach opens up a wealth of implementations alternatives in this type of applications: from how to design the data flow to how to exploit data parallelism. Choosing the best alternative is not trivial, so in this paper we analyze them and contribute with an analytical model based on queue theory that helps in the on-line selection of the alternative that maximizes the throughput and the occupancy of the CPU and GPU compute units. We explore the design space offered by: a) different APIs to define the data flow (parallel_pipeline and Flow Graph from oneTBB, and SYCL events from SYCL); b) alternative kernel implementations to express data parallelism (SYCL, AVX and std::simd); and c) the mapping of the kernels into the available computing resources (CPU cores and GPU). The results show that the std::simd library can be 1.54x faster, 3% more energy efficient, and requires 7.36x less programming effort than AVX, and that implementations that enable asynchronous offloading of tasks to the devices as those based on SYCL events and Flow Graph APIs outperform the other APIs, being up to 1.10x faster and up to 1.18x more energy efficient.
引用
收藏
页数:30
相关论文
共 36 条
  • [31] Performance estimation of data-flow applications for IP-based system design
    De Bernardinis, F
    Ferrari, A
    Watanabe, Y
    Sangiovanni-Vincenteili, A
    Terreni, P
    1998 URSI SYMPOSIUM ON SIGNALS, SYSTEMS, AND ELECTR ONICS, 1998, : 193 - 197
  • [32] Design-For-Reliability Flow in 7nm Products with Data Center and Automotive Applications
    Ahn, Jae-Gyung
    Chen, I-Ru
    Yeh, Ping-Chin
    Chang, Jonathan
    2019 IEEE INTERNATIONAL RELIABILITY PHYSICS SYMPOSIUM (IRPS), 2019,
  • [33] Design of a Neutrally Buoyant Self-Powered Multi-Parameter Sensor for Data Logging in Flow Applications
    Thiele, Sebastian
    Schoene, Sebastian
    Voigt, Felix
    Da Silva, Marco Jose
    Hampel, Uwe
    2009 IEEE SENSORS, VOLS 1-3, 2009, : 1927 - 1930
  • [34] An Automatic Design Flow for Data Parallel and Pipelined Signal Processing Applications on Embedded Multiprocessor with NoC: Application to Cryptography
    Li, Xinyu
    Hammami, Omar
    INTERNATIONAL JOURNAL OF RECONFIGURABLE COMPUTING, 2009, 2009
  • [35] Design Flow of Dynamically-Allocated Data Types in Embedded Applications Based on Elitist Evolutionary Computation Optimization
    Risco-Martin, Jose L.
    Atienza, David
    Hidalgo, J. Ignacio
    Lanchares, Juan
    11TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN - ARCHITECTURES, METHODS AND TOOLS : DSD 2008, PROCEEDINGS, 2008, : 455 - 463
  • [36] Sampled-data controller synthesis using dissipative linear periodic jump-flow systems with design applications
    Spin, Luuk Marceau
    Donkers, M. C. F.
    AUTOMATICA, 2024, 165