Exploring data flow design and vectorization with oneAPI for streaming applications on CPU plus GPU

被引:0
|
作者
Campos, Cristian [1 ]
Asenjo, Rafael [1 ]
Navarro, Angeles [1 ]
机构
[1] Univ Malaga, Dept Comp Architecture, Malaga 29071, Malaga, Spain
来源
JOURNAL OF SUPERCOMPUTING | 2025年 / 81卷 / 02期
关键词
Streaming applications; Heterogeneous computing; Analytical model; Queue theory; CPU plus GPU; oneAPI; SYCL;
D O I
10.1007/s11227-024-06891-3
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In recent times, oneAPI has emerged as a competitive framework to optimize streaming applications on heterogeneous CPU+GPU architectures, since it provides portability and performance thanks to the SYCL programming language and efficient parallel libraries as oneTBB. However, this approach opens up a wealth of implementations alternatives in this type of applications: from how to design the data flow to how to exploit data parallelism. Choosing the best alternative is not trivial, so in this paper we analyze them and contribute with an analytical model based on queue theory that helps in the on-line selection of the alternative that maximizes the throughput and the occupancy of the CPU and GPU compute units. We explore the design space offered by: a) different APIs to define the data flow (parallel_pipeline and Flow Graph from oneTBB, and SYCL events from SYCL); b) alternative kernel implementations to express data parallelism (SYCL, AVX and std::simd); and c) the mapping of the kernels into the available computing resources (CPU cores and GPU). The results show that the std::simd library can be 1.54x faster, 3% more energy efficient, and requires 7.36x less programming effort than AVX, and that implementations that enable asynchronous offloading of tasks to the devices as those based on SYCL events and Flow Graph APIs outperform the other APIs, being up to 1.10x faster and up to 1.18x more energy efficient.
引用
收藏
页数:30
相关论文
共 36 条
  • [21] BigKernel - High Performance CPU-GPU Communication Pipelining for Big Data-style Applications
    Mokhtari, Reza
    Stumm, Michael
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [22] Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM's Hybrid CPU plus GPU Systems (Part I)
    Grinberg, Leopold
    Bertolli, Carlo
    Haque, Riyaz
    SCALING OPENMP FOR EXASCALE PERFORMANCE AND PORTABILITY (IWOMP 2017), 2017, 10468 : 3 - 16
  • [23] Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM's Hybrid CPU plus GPU Systems (Part II)
    Grinberg, Leopold
    Bertolli, Carlo
    Haque, Riyaz
    SCALING OPENMP FOR EXASCALE PERFORMANCE AND PORTABILITY (IWOMP 2017), 2017, 10468 : 17 - 29
  • [24] Accurate and Reliable Energy Measurement and Modelling of Data Transfer Between CPU and GPU in Parallel Applications on Heterogeneous Hybrid Platforms
    Niaz, Hafiz Adnan
    Manumachu, Ravi Reddy
    Lastovetsky, Alexey
    IEEE TRANSACTIONS ON COMPUTERS, 2025, 74 (03) : 1011 - 1024
  • [25] High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments
    Fazel-Najafabadi, Azam
    Abbasi, Mahdi
    Attar, Hani H.
    Amer, Ayman
    Taherkordi, Amir
    Shokrollahi, Azad
    Khosravi, Mohammad R.
    Solyman, Ahmed A.
    TSINGHUA SCIENCE AND TECHNOLOGY, 2024, 29 (04): : 1118 - 1137
  • [26] CGMBE: a model-based tool for the design and implementation of real-time image processing applications on CPU–GPU platforms
    Jiahao Wu
    Jing Xie
    Alexandre Bardakoff
    Timothy Blattner
    Walid Keyrouz
    Shuvra S. Bhattacharyya
    Journal of Real-Time Image Processing, 2021, 18 : 561 - 583
  • [27] CGMBE: a model-based tool for the design and implementation of real-time image processing applications on CPU-GPU platforms
    Wu, Jiahao
    Xie, Jing
    Bardakoff, Alexandre
    Blattner, Timothy
    Keyrouz, Walid
    Bhattacharyya, Shuvra S.
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2021, 18 (03) : 561 - 583
  • [28] Exploring Sensor Usage Behaviors of Android Applications Based on Data Flow Analysis
    Liu, Xing
    Liu, Jiqiang
    Wang, Wei
    2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2015,
  • [29] Scenario-Based Design Flow for Mapping Streaming Applications onto On-Chip Many-Core Systems
    Schor, Lars
    Bacivarov, Iuliana
    Rai, Devendra
    Yang, Hoeseok
    Kang, Shin-Haeng
    Thiele, Lothar
    CASES'12: PROCEEDINGS OF THE 2012 ACM INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURES AND SYNTHESIS FOR EMBEDDED SYSTEMS, 2012, : 71 - 80
  • [30] A novel predicated data flow analysis based memory design for data and control intensive multimedia applications
    Sudarsanam, A
    Panchanathan, S
    EMBEDDED PROCESSORS FOR MULTIMEDIA AND COMMUNICATIONS II, 2005, 5683 : 64 - 71