HGNAS: <underline>H</underline>ardware-Aware <underline>G</underline>raph <underline>N</underline>eural <underline>A</underline>rchitecture <underline>S</underline>earch for Edge Devices

被引:0
|
作者
Zhou, Ao [1 ]
Yang, Jianlei [1 ]
Qi, Yingjie [1 ]
Qiao, Tong [1 ]
Shi, Yumeng [1 ]
Duan, Cenlin [2 ]
Zhao, Weisheng [2 ]
Hu, Chunming [1 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Integrated Circuits & Engn, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Graph neural networks; Aggregates; Hardware; Accuracy; Performance evaluation; Point cloud compression; Computer architecture; hardware-aware neural architecture search; edge devices; hardware efficiency prediction;
D O I
10.1109/TC.2024.3449108
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graph Neural Networks (GNNs) are becoming increasingly popular for graph-based learning tasks such as point cloud processing due to their state-of-the-art (SOTA) performance. Nevertheless, the research community has primarily focused on improving model expressiveness, lacking consideration of how to design efficient GNN models for edge scenarios with real-time requirements and limited resources. Examining existing GNN models reveals varied execution across platforms and frequent Out-Of-Memory (OOM) problems, highlighting the need for hardware-aware GNN design. To address this challenge, this work proposes a novel hardware-aware graph neural architecture search framework tailored for resource constraint edge devices, namely HGNAS. To achieve hardware awareness, HGNAS integrates an efficient GNN hardware performance predictor that evaluates the latency and peak memory usage of GNNs in milliseconds. Meanwhile, we study GNN memory usage during inference and offer a peak memory estimation method, enhancing the robustness of architecture evaluations when combined with predictor outcomes. Furthermore, HGNAS constructs a fine-grained design space to enable the exploration of extreme performance architectures by decoupling the GNN paradigm. In addition, the multi-stage hierarchical search strategy is leveraged to facilitate the navigation of huge candidates, which can reduce the single search time to a few GPU hours. To the best of our knowledge, HGNAS is the first automated GNN design framework for edge devices, and also the first work to achieve hardware awareness of GNNs across different platforms. Extensive experiments across various applications and edge devices have proven the superiority of HGNAS. It can achieve up to a 10.6x speedup and an 82.5% peak memory reduction with negligible accuracy loss compared to DGCNN on ModelNet40.
引用
收藏
页码:2693 / 2707
页数:15
相关论文
共 50 条
  • [21] FLASH-and-Prune: <underline>F</underline>ederated <underline>L</underline>earning for <underline>A</underline>utomated <underline>S</underline>election of <underline>H</underline>igh-Band mmWave Sectors using Model Pruning
    Salehi, Batool
    Roy, Debashri
    Gu, Jerry
    Dick, Chris
    Chowdhury, Kaushik
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 11655 - 11669
  • [22] <underline>G</underline>lobal <underline>R</underline>egistry of <underline>A</underline>dverse <underline>C</underline>linical <underline>E</underline>vents (GRACE©): A Prospective, Multicenter, Observational Cohort Evaluating Complications Associated with Aesthetic Injectables
    Enright, Kaitlyn M.
    Nikolis, Andreas
    Sampalis, John
    JOURNAL OF CUTANEOUS MEDICINE AND SURGERY, 2025,
  • [23] RETROFIT: R<underline>e</underline>al-Time Con<underline>tr</underline>ol <underline>of</underline> T<underline>i</underline>me-Dependen<underline>t</underline> 3D Point Cloud Profiles
    Biehler, Michael
    Shi, Jianjun
    JOURNAL OF MANUFACTURING SCIENCE AND ENGINEERING-TRANSACTIONS OF THE ASME, 2024, 146 (06):
  • [24] UA-MPC: <underline>U</underline>ncertainty-<underline>A</underline>ware <underline>M</underline>odel <underline>P</underline>redictive <underline>C</underline>ontrol for Motorized LiDAR Odometry
    Li, Jianping
    Xu, Xinhang
    Liu, Jinxin
    Cao, Kun
    Yuan, Shenghai
    Xie, Lihua
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (04): : 3652 - 3659
  • [25] Design and rationale of the <underline>C</underline>OVID vaccine-<underline>a</underline>ssociated <underline>m</underline>yocarditis/ <underline>p</underline>ericarditis (CAMP) study
    Truong, Dongngan T.
    Harty, Brian J.
    Bainton, Jessica
    Baker, Annette
    Bradford, Tamara T.
    Cai, Bing
    Coleman, Julia
    de Luise, Cynthia
    Dionne, Audrey
    Friedman, Kevin
    Gayed, Juleen
    Graham, Emily
    Jone, Pei-Ni
    Lanes, Stephan
    Pearson, Gail D.
    Portman, Michael A.
    Powell, Andrew J.
    Russell, Mark W.
    Sabati, Arash A.
    Taylor, Michael D.
    Wheaton, Olivia
    Newburger, Jane W.
    AMERICAN HEART JOURNAL, 2025, 281 : 32 - 42
  • [26] <underline>Ex</underline>ploring the <underline>po</underline>tential of <underline>l</underline>arge language models for integration into an academic <underline>s</underline>tatistical consulting service-the EXPOLS study protocol
    Fichtner, Urs Alexander
    Knaus, Jochen
    Graf, Erika
    Koch, Georg
    Sahlmann, Joerg
    Stelzer, Dominikus
    Wolkewitz, Martin
    Binder, Harald
    Weber, Susanne
    PLOS ONE, 2024, 19 (12):
  • [27] The Myth of MARD (<underline>M</underline>ean <underline>A</underline>bsolute <underline>R</underline>elative <underline>D</underline>ifference): Limitations of MARD in the Clinical Assessment of Continuous Glucose Monitoring Data
    Vigersky, Robert A.
    Shin, John
    DIABETES TECHNOLOGY & THERAPEUTICS, 2024, 26 : 38 - 44
  • [28] CareFL: <underline>C</underline>ontribution Guided Byz<underline>a</underline>ntine-<underline>R</underline>obust F<underline>e</underline>derated Learning
    Dong, Qihao
    Yang, Shengyuan
    Dai, Zhiyang
    Gao, Yansong
    Wang, Shang
    Cao, Yuan
    Fu, Anmin
    Susilo, Willy
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 9714 - 9729
  • [29] iHerd: an <underline>i</underline>ntegrative <underline>h</underline>i<underline>e</underline>rarchical graph <underline>r</underline>epresentation learning framework to quantify network changes and prioritize risk genes in <underline>d</underline>isease
    Duan, Ziheng
    Dai, Yi
    Hwang, Ahyeon
    Lee, Cheyu
    Xie, Kaichi
    Xiao, Chutong
    Xu, Min
    Girgenti, Matthew J.
    Zhang, Jing
    PLOS COMPUTATIONAL BIOLOGY, 2023, 19 (09)
  • [30] SADIMM: Accelerating <underline>S</underline>parse <underline>A</underline>ttention Using <underline>DIMM</underline>-Based Near-Memory Processing
    Li, Huize
    Chen, Dan
    Mitra, Tulika
    IEEE TRANSACTIONS ON COMPUTERS, 2025, 74 (02) : 542 - 554