MetaNMP: Leveraging Cartesian-Like Product to Accelerate HGNNs with Near-Memory Processing

被引:8
|
作者
Chen, Dan [1 ]
He, Haiheng [1 ]
Jin, Hai [1 ]
Zheng, Long [1 ]
Huang, Yu [1 ]
Shen, Xinyang [1 ]
Liao, Xiaofei [1 ]
机构
[1] Huazhong Univ Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Serv Comp Technol & Syst Lab, Cluster & Grid Comp Lab,Sch Comp Sci & Technol, Wuhan, Peoples R China
基金
中国国家自然科学基金;
关键词
Heterogeneous graph neural networks; cartesian product; near-memory processing;
D O I
10.1145/3579371.3589091
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Heterogeneous graph neural networks (HGNNs) based on metapath exhibit powerful capturing of rich structural and semantic information in the heterogeneous graph. HGNNs are highly memory-bound and thus can be accelerated by near-memory processing. However, they also suffer from significant memory footprint (due to storing metapath instances as intermediate data) and severe redundant computation (when vertex features are aggregated among metapath instances). To address these issues, this paper proposes MetaNMP, the first DIMM-based near-memory processing HGNNs accelerator with reduced memory footprint and high performance. Specifically, we first propose a cartesian-like product paradigm to generate all metapath instances on the fly for heterogeneous graphs. In this way, metapath instances no longer need to be stored as intermediate data, avoiding significant memory consumption. We then design a data flow for aggregating vertex features on metapath instances, which aggregates vertex features along the direction of the metapath instances dispersed from the starting vertex to exploit shareable aggregation computations, eliminating most of the redundant computations. Finally, we integrate specialized hardware units in DIMM to accelerate HGNNs with near-memory processing, and introduce a broadcast mechanism for edge data and vertex features to mitigate the inter-DIMM communication. Our evaluation shows that MetaNMP achieves the memory space reduction of 51.9% on average and the performance improvement by 415.18x compared to NVIDIA Tesla V100 GPU.
引用
收藏
页码:784 / 796
页数:13
相关论文
共 38 条
  • [21] SuperCut: Communication-Aware Partitioning for Near-Memory Graph Processing
    Zhao, Chenfeng
    Chamberlain, Roger D.
    Zhang, Xuan
    PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2023, CF 2023, 2023, : 42 - 51
  • [22] Towards Accelerating k-NN with MPI and Near-Memory Processing
    Ahn, Hooyoung
    Kim, Seonyoung
    Park, Yoomi
    Han, Woojong
    Contini, Nick
    Ramesh, Bharath
    Abduljabbar, Mustafa
    Panda, Dhabaleswar K.
    2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 608 - 615
  • [23] AOI-Based Data-Centric Circuits for Near-Memory Processing
    Junsangsri, Salin
    Lombardi, Fabrizio
    PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL SYMPOSIUM ON NANOSCALE ARCHITECTURES (NANOARCH 2017), 2017, : 7 - 12
  • [24] A Precision -Optimized Fixed -Point Near-Memory Digital Processing Unit for Analog In -Memory Computing
    Ferro, Elena
    Vasilopoulos, Athanasios
    Lammie, Corey
    Le Gallo, Manuel
    Benini, Luca
    Boybat, Irem
    Sebastian, Abu
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [25] Cache Register Sharing Structure for Channel-level Near-memory Processing in NAND Flash Memory
    Kim, Hyunwoo
    Lee, Hyundong
    Kim, Jongbeom
    Go, Yunjeong
    Baek, Seungwon
    Song, Jaehong
    Kim, Junhyeon
    Jung, Minyoung
    Kim, Hyodong
    Kim, Seongju
    Song, Taigon
    2023 24TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, ISQED, 2023, : 718 - 723
  • [26] Coherently Attached Programmable Near-Memory Acceleration Platform and its application to Stencil Processing
    van Lunteren, Jan
    Luijten, Ronald
    Diamantopoulos, Dionysios
    Auernhammer, Florian
    Hagleitner, Christoph
    Chelini, Lorenzo
    Corda, Stefano
    Singh, Gagandeep
    2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 668 - 673
  • [27] TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
    Kwon, Youngeun
    Lee, Yunjae
    Rhu, Minsoo
    MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, : 740 - 753
  • [28] HybriDS: Cache-Conscious Concurrent Data Structures for Near-Memory Processing Architectures
    Choe, Jiwon
    Crotty, Andrew
    Moreshet, Tali
    Herlihy, Maurice
    Bahar, R. Iris
    PROCEEDINGS OF THE 34TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, SPAA 2022, 2022, : 321 - 332
  • [29] DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory Processing
    Zhou, Zhe
    Li, Cong
    Yang, Fan
    Sun, Guangyu
    2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 302 - 316
  • [30] SARDIMM: High-Speed Near-Memory Processing Architecture for Synthetic Aperture Radar Imaging
    Kim, Haechan
    Heo, Jinmoo
    Lee, Seongjoo
    Jung, Yunho
    APPLIED SCIENCES-BASEL, 2024, 14 (17):