DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory Processing

被引:8
|
作者
Zhou, Zhe [1 ,2 ,3 ]
Li, Cong [1 ,3 ]
Yang, Fan [4 ]
Sun, Guangyu [1 ,3 ]
机构
[1] Peking Univ, Sch Integrated Circuits, Beijing, Peoples R China
[2] Peking Univ, Sch Comp Sci, Beijing, Peoples R China
[3] Nankai Univ, Beijing Adv Innovat Ctr Integrated Circuits, Tianjin, Peoples R China
[4] Nankai Univ, Sch Comp Sci, Tianjin, Peoples R China
关键词
D O I
10.1109/HPCA56546.2023.10071005
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
DIMM-based near-memory processing architectures (DIMM-NMP) have received growing interest from both academia and industry. They have the advantages of large memory capacity, low manufacturing cost, high flexibility, compatible form factor, etc. However, inter-DIMM communication (IDC) has become a critical obstacle for generic DIMM-NMP architectures because it involves costly forwarding transactions through the host CPU. Recent research has demonstrated that, for many applications, the overhead induced by IDC may even offset the performance and energy benefits of near-memory processing. To tackle this problem, we propose DIMM-Link, which enables high-performance IDC in DIMM-NMP architectures and supports seamless integration with existing host memory systems. It adopts bidirectional external data links to connect DIMMs, via which point-to-point communication and inter-DIMM broadcast are efficiently supported in a packet-routing way. We present the full-stack design of DIMM-Link, including the hardware architecture, interconnect protocol, system organization, routing mechanisms, optimization strategies, etc. Comprehensive experiments on typical data-intensive tasks demonstrate that the DIMM-Link-equipped NMP system can achieve a 5:93x average speedup over the 16-core CPU baseline. Compared to other IDC methods, DIMM-Link outperforms MCN, AIM, and ABC-DIMM by 2:42x, 1:87x, and 1:77x, respectively. More importantly, DIMM-Link fully considers the implementation feasibility and system integration constraints, which are critical for designing NMP architectures based on modern DDR4/DDR5 DIMMs.
引用
收藏
页码:302 / 316
页数:15
相关论文
共 11 条
  • [1] ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast
    Sun, Weiyi
    Li, Zhaoshi
    Yin, Shouyi
    Wei, Shaojun
    Liu, Leibo
    2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 237 - 250
  • [2] NMExplorer: An Efficient Exploration Framework for DIMM-based Near-Memory Tensor Reduction
    Li, Cong
    Zhou, Zhe
    Li, Xingchen
    Sun, Guangyu
    Niu, Dimin
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [3] G-NMP: Accelerating Graph Neural Networks with DIMM-based Near-Memory Processing
    Tian, Teng
    Wang, Xiaotian
    Zhao, Letian
    Wu, Wei
    Zhang, Xuecang
    Lu, Fangmin
    Wang, Tianqi
    Jin, Xi
    JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 129
  • [4] SADIMM: Accelerating <underline>S</underline>parse <underline>A</underline>ttention Using <underline>DIMM</underline>-Based Near-Memory Processing
    Li, Huize
    Chen, Dan
    Mitra, Tulika
    IEEE TRANSACTIONS ON COMPUTERS, 2025, 74 (02) : 542 - 554
  • [5] SuperCut: Communication-Aware Partitioning for Near-Memory Graph Processing
    Zhao, Chenfeng
    Chamberlain, Roger D.
    Zhang, Xuan
    PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2023, CF 2023, 2023, : 42 - 51
  • [6] Near-Memory Parallel Indexing and Coalescing: Enabling Highly Efficient Indirect Access for SpMV
    Zhang, Chi
    Scheffler, Paul
    Benz, Thomas
    Perotti, Matteo
    Benini, Luca
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [7] GATe: Streamlining Memory Access and Communication to Accelerate Graph Attention Network With Near-Memory Processing
    Yi, Shiyan
    Qiu, Yudi
    Lu, Lingfei
    Xu, Guohao
    Gong, Yong
    Zeng, Xiaoyang
    Fan, Yibo
    IEEE COMPUTER ARCHITECTURE LETTERS, 2024, 23 (01) : 87 - 90
  • [8] Enabling Efficient Large Recommendation Model Training with Near CXL Memory Processing
    Liu, Haifeng
    Zheng, Long
    Huang, Yu
    Zhou, Jingyi
    Liu, Chaoqiang
    Wang, Runze
    Liao, Xiaofei
    Jin, Hai
    Xue, Jingling
    2024 ACM/IEEE 51ST ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2024, 2024, : 382 - 395
  • [9] Via-switch FPGA with transistor-free programmability enabling energy-efficient near-memory parallel computation
    Hashimoto, Masanori
    Bai, Xu
    Banno, Naoki
    Tada, Munehiro
    Sakamoto, Toshitsugu
    Yu, Jaehoon
    Doi, Ryutaro
    Onodera, Hidetoshi
    Imagawa, Takashi
    Ochi, Hiroyuki
    Wakabayashi, Kazutoshi
    Mitsuyama, Yukio
    Sugibayashi, Tadahiko
    JAPANESE JOURNAL OF APPLIED PHYSICS, 2022, 61 (SM)
  • [10] Enabling fast and energy-efficient FM-index exact matching using processing-near-memory
    Herruzo, Jose M.
    Fernandez, Ivan
    Gonzalez-Navarro, Sonia
    Plata, Oscar
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (09): : 10226 - 10251