MetaNMP: Leveraging Cartesian-Like Product to Accelerate HGNNs with Near-Memory Processing

被引:8
|
作者
Chen, Dan [1 ]
He, Haiheng [1 ]
Jin, Hai [1 ]
Zheng, Long [1 ]
Huang, Yu [1 ]
Shen, Xinyang [1 ]
Liao, Xiaofei [1 ]
机构
[1] Huazhong Univ Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Serv Comp Technol & Syst Lab, Cluster & Grid Comp Lab,Sch Comp Sci & Technol, Wuhan, Peoples R China
基金
中国国家自然科学基金;
关键词
Heterogeneous graph neural networks; cartesian product; near-memory processing;
D O I
10.1145/3579371.3589091
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Heterogeneous graph neural networks (HGNNs) based on metapath exhibit powerful capturing of rich structural and semantic information in the heterogeneous graph. HGNNs are highly memory-bound and thus can be accelerated by near-memory processing. However, they also suffer from significant memory footprint (due to storing metapath instances as intermediate data) and severe redundant computation (when vertex features are aggregated among metapath instances). To address these issues, this paper proposes MetaNMP, the first DIMM-based near-memory processing HGNNs accelerator with reduced memory footprint and high performance. Specifically, we first propose a cartesian-like product paradigm to generate all metapath instances on the fly for heterogeneous graphs. In this way, metapath instances no longer need to be stored as intermediate data, avoiding significant memory consumption. We then design a data flow for aggregating vertex features on metapath instances, which aggregates vertex features along the direction of the metapath instances dispersed from the starting vertex to exploit shareable aggregation computations, eliminating most of the redundant computations. Finally, we integrate specialized hardware units in DIMM to accelerate HGNNs with near-memory processing, and introduce a broadcast mechanism for edge data and vertex features to mitigate the inter-DIMM communication. Our evaluation shows that MetaNMP achieves the memory space reduction of 51.9% on average and the performance improvement by 415.18x compared to NVIDIA Tesla V100 GPU.
引用
收藏
页码:784 / 796
页数:13
相关论文
共 38 条
  • [31] GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing
    Zhou, Zhe
    Li, Cong
    Wei, Xuechao
    Wang, Xiaoyang
    Sun, Guangyu
    PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 54 - 68
  • [32] G-NMP: Accelerating Graph Neural Networks with DIMM-based Near-Memory Processing
    Tian, Teng
    Wang, Xiaotian
    Zhao, Letian
    Wu, Wei
    Zhang, Xuecang
    Lu, Fangmin
    Wang, Tianqi
    Jin, Xi
    JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 129
  • [33] ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast
    Sun, Weiyi
    Li, Zhaoshi
    Yin, Shouyi
    Wei, Shaojun
    Liu, Leibo
    2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 237 - 250
  • [34] SADIMM: Accelerating <underline>S</underline>parse <underline>A</underline>ttention Using <underline>DIMM</underline>-Based Near-Memory Processing
    Li, Huize
    Chen, Dan
    Mitra, Tulika
    IEEE TRANSACTIONS ON COMPUTERS, 2025, 74 (02) : 542 - 554
  • [35] TiPU: A Spatial-Locality-Aware Near-Memory Tile Processing Unit for 3D Point Cloud Neural Network
    Zheng, Jiapei
    Jiang, Hao
    Nie, Xinkai
    Huang, Zhangcheng
    Chen, Chixiao
    Liu, Qi
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [36] Guest Editorial IEEE Transactions on Emerging Topics in Computing Thematic Section on Memory- Centric Designs: Processing-in-Memory, In-Memory Computing, and Near-Memory Computing for Real-World Applications
    Chang, Yuan-Hao
    Piuri, Vincenzo
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (02) : 278 - 280
  • [37] Instant-NeRF: Instant On-Device Neural Radiance Field Training via Algorithm-Accelerator Co-Designed Near-Memory Processing
    Zhao, Yang
    Wu, Shang
    Zhang, Jingqun
    Li, Sixu
    Li, Chaolian
    Lin, Yingyan
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [38] A 1-16b Reconfigurable 80Kb 7T SRAM-Based Digital Near-Memory Computing Macro for Processing Neural Networks
    Kim, Hyunjoon
    Mu, Junjie
    Yu, Chengshuo
    Kim, Tony Tae-Hyoung
    Kim, Bongjin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (04) : 1580 - 1590