Analysis and Modeling of Collaborative Execution Strategies for Heterogeneous CPU-FPGA Architectures

被引:23
|
作者
Huang, Sitao [1 ]
Chang, Li-Wen [2 ,7 ]
El Hajj, Izzat [1 ]
De Gonzalo, Simon Garcia [3 ]
Gomez-Luna, Juan [4 ]
Chalamalasetti, Sai Rahul [5 ]
El-Hadedy, Mohamed [6 ]
Milojicic, Dejan [5 ]
Mutlu, Onur [4 ]
Chen, Deming [1 ]
Hwu, Wen-mei [1 ]
机构
[1] UIUC, ECE, Champaign, IL 61820 USA
[2] Microsoft, Albuquerque, NM USA
[3] UIUC, CS, Champaign, IL USA
[4] Swiss Fed Inst Technol, CS, Zurich, Switzerland
[5] Hewlett Packard Labs, Palo Alto, CA USA
[6] Cal Poly Pomona, ECE, Pomona, CA USA
[7] UIUC, Champaign, IL USA
关键词
CPU-FPGA architectures; Heterogeneous systems; OpenCL; Performance analysis;
D O I
10.1145/3297663.3310305
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Heterogeneous CPU-FPGA systems are evolving towards tighter integration between CPUs and FPGAs for improved performance and energy efficiency. At the same time, programmability is also improving with High Level Synthesis tools (e.g., OpenCL Software Development Kits), which allow programmers to express their designs with high-level programming languages, and avoid time-consuming and error-prone register-transfer level (RTL) programming. In the traditional loosely-coupled accelerator mode, FPGAs work as of-fload accelerators, where an entire kernel runs on the FPGA while the CPU thread waits for the result. However, tighter integration of the CPUs and the FPGAs enables the possibility of fine-grained collaborative execution, i.e., having both devices working concurrently on the same workload. Such collaborative execution makes better use of the overall system resources by employing both CPU threads and FPGA concurrency, thereby achieving higher performance. In this paper, we explore the potential of collaborative execution between CPUs and FPGAs using OpenCL High Level Synthesis. First, we compare various collaborative techniques (namely, data partitioning and task partitioning), and evaluate the tradeoffs between them. We observe that choosing the most suitable partitioning strategy can improve performance by up to 2x. Second, we study the impact of a common optimization technique, kernel duplication, in a collaborative CPU-FPGA context. We show that the general trend is that kernel duplication improves performance until the memory bandwidth saturates. Third, we provide new insights that application developers can use when designing CPU-FPGA collaborative applications to choose between different partitioning strategies. We find that different partitioning strategies pose different tradeoffs (e.g., task partitioning enables more kernel duplication, while data partitioning has lower communication overhead and better load balance), but they generally outperform execution on conventional CPU-FPGA systems where no collaborative execution strategies are used. Therefore, we advocate even more integration in future heterogeneous CPU-FPGA systems (e.g., OpenCL 2.0 features, such as fine-grained shared virtual memory).
引用
收藏
页码:79 / 90
页数:12
相关论文
共 50 条
  • [1] SGX-FPGA: Trusted Execution Environment for CPU-FPGA Heterogeneous Architecture
    Xia, Ke
    Luo, Yukui
    Xu, Xiaolin
    Wei, Sheng
    [J]. 2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 301 - 306
  • [2] HeteroSim: A Heterogeneous CPU-FPGA Simulator
    Feng, Liang
    Liang, Hao
    Sinha, Sharad
    Zhang, Wei
    [J]. IEEE COMPUTER ARCHITECTURE LETTERS, 2017, 16 (01) : 38 - 41
  • [3] HeteroSim: A Heterogeneous CPU-FPGA Simulator
    Feng, Liang
    Liang, Hao
    Sinha, Sharad
    Zhang, Wei
    [J]. 2016 26TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2016,
  • [4] In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms
    Choi, Young-Kyu
    Cong, Jason
    Fang, Zhenman
    Hao, Yuchen
    Reinman, Glenn
    Wei, Peng
    [J]. ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2019, 12 (01)
  • [5] Accelerating Graph Analytics on CPU-FPGA Heterogeneous Platform
    Zhou, Shijie
    Prasanna, Viktor K.
    [J]. 2017 29TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2017, : 137 - 144
  • [6] Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures
    Sidler, David
    Istvan, Zsolt
    Owaida, Muhsen
    Alonso, Gustavo
    [J]. SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 403 - 415
  • [7] Designing a Collision Detection Accelerator on a Heterogeneous CPU-FPGA Platform
    Alves, Fredy Augusto M.
    Jamieson, Peter
    da Silva, Lucas B.
    Ferreira, Ricardo S.
    Nacif, Jose Augusto M.
    [J]. 2017 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2017,
  • [8] Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform
    Chen, Ren
    Prasanna, Viktor K.
    [J]. 2016 IEEE 24TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2016, : 212 - 219
  • [9] A Rapid Data Communication Exploration Tool for Hybrid CPU-FPGA Architectures
    Makni, Mariem
    Niar, Smail
    Baklouti, Mouna
    Zhong, Guanwen
    Mitra, Tulika
    Abid, Mohamed
    [J]. 2017 25TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2017), 2017, : 85 - 92
  • [10] A Hybrid Approach to Cache Management in Heterogeneous CPU-FPGA Platforms
    Feng, Liang
    Sinha, Sharad
    Zhang, Wei
    Liang, Yun
    [J]. 2017 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2017, : 937 - 944