Near-Data FPGA-Accelerated Processing of Collective and Inference Operations in Disaggregated Memory Systems

被引：2

作者：

Heinz, Carsten ^{[1
]}

Koch, Andreas ^{[1
]}

机构：

[1] Tech Univ Darmstadt, Embedded Syst & Applicat Grp, Darmstadt, Germany

来源：

PROCEEDINGS OF SEVENTH INTERNATIONAL WORKSHOP ON HETEROGENEOUS HIGH-PERFORMANCE RECONFIGURABLE COMPUTING (H2RC 2021) | 2021年

关键词：

D O I：

10.1109/H2RC54759.2021.00010

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

With growing data set sizes, many scientific and data center HPC workloads observe an increasing scaling imbalance, e.g., between compute and memory capacities. As a solution, disaggregated system architectures employ spatial distribution of the different resources. They aim for independent scaling of the different resource kinds (e.g., compute, non-volatile storage, memory), and use fast communication fabrics for their interconnection. However, for some bulk operations, such as reductions and collections, it is still beneficial to perform them close to the memories, avoiding the need to move large volumes of data over the fabric. This work realizes a disaggregated system capable of performing such near-data processing (NDP) operations by extending the distributed memory controllers with hardware-accelerated compute capabilities. The actual computations execute on FPGAs and can be abstractly described using C/C++ as compilable by high-level hardware synthesis (HLS) tools. We have aimed for high usability of our technology also by HPC experts unfamiliar with hardware design. An automated toolflow encapsulates the creation and deployment of the actual accelerators in the disaggregated system. The NDP operations execute distributed across all memory nodes, and are easily accessed using a simple MPI-based programming interface that requires only minimal effort to use in existing applications. Our solution is demonstrated using a prototype disaggregated system based on the low-latency EXTOLL fabric for communication. We evaluate both conventional reductions/collectives as well as complete machine-learning inference tasks.

引用

页码：44 / 51

页数：8

共 27 条

[1] FPGA-accelerated simulation of variable latency memory systems
Cilasun, Husrev
Macaraeg, Christopher
Peng, Ivy
Sarkar, Abhik
Gokhale, Maya
[J]. PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, MEMSYS 2022, 2022,
[2] An Architecture for Near-Data Processing Systems
Vermij, Erik
Hagleitner, Christoph
Fiorin, Leandro
Jongerius, Rik
van Lunteren, Jan
Bertels, Koen
[J]. PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 357 - 360
[3] FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems
Kim, Hyeseong
Lee, Yunjae
Rhu, Minsoo
[J]. IEEE COMPUTER ARCHITECTURE LETTERS, 2024, 23 (01) : 9 - 10
[4] FAXID: FPGA-Accelerated XGBoost Inference for Data Centers using HLS
Gajjar, Archit
Kashyap, Priyank
Aysu, Aydin
Franzon, Paul
Dey, Sumon
Cheng, Chris
[J]. 2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022), 2022, : 113 - 121
[5] Towards Near-Data Processing of Compare Operations in 3D-Stacked Memory
Das, Palash
Kapoor, Hemangee K.
[J]. PROCEEDINGS OF THE 2018 GREAT LAKES SYMPOSIUM ON VLSI (GLSVLSI'18), 2018, : 243 - 248
[6] SecNDP: Secure Near-Data Processing with Untrusted Memory
Xiong, Wenjie
Ke, Liu
Jankov, Dimitrije
Kounavis, Michael
Wang, Xiaochen
Northup, Eric
Yang, Jie Amy
Acun, Bilge
Wu, Carole-Jean
Tang, Ping Tak Peter
Suh, G. Edward
Zhang, Xuan
Lee, Hsien-Hsin S.
[J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, : 244 - 258
[7] Near-Data Processing in Memory Expander for DNN Acceleration on GPUs
Ham, Hyungkyu
Cho, Hyunuk
Kim, Minjae
Park, Jueon
Hong, Jeongmin
Sung, Hyojin
Park, Eunhyeok
Lim, Euicheol
Kim, Gwangsun
[J]. IEEE COMPUTER ARCHITECTURE LETTERS, 2021, 20 (02) : 171 - 174
[8] Sorting big data on heterogeneous near-data processing systems
Vermij, Erik
Fiorin, Leandro
Hagleitner, Christoph
Bertels, Koen
[J]. ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017, 2017, : 349 - 354
[9] NDPGNN: A Near-Data Processing Architecture for GNN Training and Inference Acceleration
Wang, Haoyang
Zhang, Shengbing
Fan, Xiaoya
Yang, Zhao
Zhang, Meng
[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024, 43 (11) : 3997 - 4008
[10] Practical Near-Data Processing for In-memory Analytics Frameworks
Gao, Mingyu
Ayers, Grant
Kozyrakis, Christos
[J]. 2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT), 2015, : 113 - 124

← 1 2 3 →