Near-Data FPGA-Accelerated Processing of Collective and Inference Operations in Disaggregated Memory Systems

被引:2
|
作者
Heinz, Carsten [1 ]
Koch, Andreas [1 ]
机构
[1] Tech Univ Darmstadt, Embedded Syst & Applicat Grp, Darmstadt, Germany
关键词
D O I
10.1109/H2RC54759.2021.00010
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With growing data set sizes, many scientific and data center HPC workloads observe an increasing scaling imbalance, e.g., between compute and memory capacities. As a solution, disaggregated system architectures employ spatial distribution of the different resources. They aim for independent scaling of the different resource kinds (e.g., compute, non-volatile storage, memory), and use fast communication fabrics for their interconnection. However, for some bulk operations, such as reductions and collections, it is still beneficial to perform them close to the memories, avoiding the need to move large volumes of data over the fabric. This work realizes a disaggregated system capable of performing such near-data processing (NDP) operations by extending the distributed memory controllers with hardware-accelerated compute capabilities. The actual computations execute on FPGAs and can be abstractly described using C/C++ as compilable by high-level hardware synthesis (HLS) tools. We have aimed for high usability of our technology also by HPC experts unfamiliar with hardware design. An automated toolflow encapsulates the creation and deployment of the actual accelerators in the disaggregated system. The NDP operations execute distributed across all memory nodes, and are easily accessed using a simple MPI-based programming interface that requires only minimal effort to use in existing applications. Our solution is demonstrated using a prototype disaggregated system based on the low-latency EXTOLL fabric for communication. We evaluate both conventional reductions/collectives as well as complete machine-learning inference tasks.
引用
收藏
页码:44 / 51
页数:8
相关论文
共 27 条
  • [1] FPGA-accelerated simulation of variable latency memory systems
    Cilasun, Husrev
    Macaraeg, Christopher
    Peng, Ivy
    Sarkar, Abhik
    Gokhale, Maya
    [J]. PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, MEMSYS 2022, 2022,
  • [2] An Architecture for Near-Data Processing Systems
    Vermij, Erik
    Hagleitner, Christoph
    Fiorin, Leandro
    Jongerius, Rik
    van Lunteren, Jan
    Bertels, Koen
    [J]. PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 357 - 360
  • [3] FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems
    Kim, Hyeseong
    Lee, Yunjae
    Rhu, Minsoo
    [J]. IEEE COMPUTER ARCHITECTURE LETTERS, 2024, 23 (01) : 9 - 10
  • [4] FAXID: FPGA-Accelerated XGBoost Inference for Data Centers using HLS
    Gajjar, Archit
    Kashyap, Priyank
    Aysu, Aydin
    Franzon, Paul
    Dey, Sumon
    Cheng, Chris
    [J]. 2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022), 2022, : 113 - 121
  • [5] Towards Near-Data Processing of Compare Operations in 3D-Stacked Memory
    Das, Palash
    Kapoor, Hemangee K.
    [J]. PROCEEDINGS OF THE 2018 GREAT LAKES SYMPOSIUM ON VLSI (GLSVLSI'18), 2018, : 243 - 248
  • [6] SecNDP: Secure Near-Data Processing with Untrusted Memory
    Xiong, Wenjie
    Ke, Liu
    Jankov, Dimitrije
    Kounavis, Michael
    Wang, Xiaochen
    Northup, Eric
    Yang, Jie Amy
    Acun, Bilge
    Wu, Carole-Jean
    Tang, Ping Tak Peter
    Suh, G. Edward
    Zhang, Xuan
    Lee, Hsien-Hsin S.
    [J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, : 244 - 258
  • [7] Near-Data Processing in Memory Expander for DNN Acceleration on GPUs
    Ham, Hyungkyu
    Cho, Hyunuk
    Kim, Minjae
    Park, Jueon
    Hong, Jeongmin
    Sung, Hyojin
    Park, Eunhyeok
    Lim, Euicheol
    Kim, Gwangsun
    [J]. IEEE COMPUTER ARCHITECTURE LETTERS, 2021, 20 (02) : 171 - 174
  • [8] Sorting big data on heterogeneous near-data processing systems
    Vermij, Erik
    Fiorin, Leandro
    Hagleitner, Christoph
    Bertels, Koen
    [J]. ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017, 2017, : 349 - 354
  • [9] NDPGNN: A Near-Data Processing Architecture for GNN Training and Inference Acceleration
    Wang, Haoyang
    Zhang, Shengbing
    Fan, Xiaoya
    Yang, Zhao
    Zhang, Meng
    [J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024, 43 (11) : 3997 - 4008
  • [10] Practical Near-Data Processing for In-memory Analytics Frameworks
    Gao, Mingyu
    Ayers, Grant
    Kozyrakis, Christos
    [J]. 2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT), 2015, : 113 - 124