Network Group Partition and Core Placement Optimization for Neuromorphic Multi-Core and Multi-Chip Systems

被引:0
|
作者
Yang, Yukuan [1 ,2 ]
Fan, Qihang [3 ]
Yan, Tianyi [4 ]
Pei, Jing [3 ]
Li, Guoqi [5 ,6 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China
[2] Tsinghua Univ, Ctr Brain Inspired Comp Res, Dept Precis Instrument, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Precis Instrument, Beijing 100084, Peoples R China
[4] Beijing Inst Technol, Sch Life Sci, Beijing 100081, Peoples R China
[5] Chinese Acad Sci, Inst Automat, Beijing 100045, Peoples R China
[6] Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
关键词
Multicore processing; Optimization; System recovery; Throughput; Neuromorphics; Hardware; Costs; Network group partition; core placement optimization; neuromorphic chips; multi-core and multi-chip systems; CHIP;
D O I
10.1109/TETCI.2024.3379165
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neuromorphic chips with multi-core architecture are considered to be of great potential for the next generation of artificial intelligence (AI) chips because of the avoidance of the memory wall effect. Deploying deep neural networks (DNNs) to these chips requires two stages, namely, network partition and core placement. For the network partition, existing schemes are mostly manual or only focus on single-layer, small-scale network partitions. For the core placement, to the best of our knowledge, there is still no work that has completely solved the communication deadlock problem at the clock-level which commonly exists in the applications of neuromorphic multi-core and multi-chip (NMCMC) systems. To address these issues that affect the operating and deployment efficiency of NMCMC systems, we formulate the network group partition problem as an optimization problem for the first time and propose a search-based network group partition scheme to solve the problem. A clock-level multi-chip simulator is established to completely avoid the deadlock problem during the core placement optimization process. What's more, a region constrained simulated annealing (RCSA) algorithm is proposed to improve the efficiency of the core placement optimization. Finally, an automated toolchain for the efficient deployment of DNNs in the NMCMC systems is developed by integrating the proposed network group partition and core placement schemes together. Experiments show the proposed group partition scheme can achieve 22.25%, 17.77%, 14.80% less in core number, 9.44%, 7.96%, 5.16% improvements in memory utilization, and more balanced communication and computation loads compared with existing manual schemes in ResNet-18, ResNet-34, and ResNet-50, respectively. In addition, the proposed core placement optimization based on the RCSA algorithm shows higher efficiency with much fewer optimization steps and can realize 9.52%, 11.91%, and 27.52% higher in throughput compared with sequential core placement without deadlock in the ResNet-18, ResNet-34, and ResNet-50 networks. This work paves the way for applying NMCMC systems to real-world scenarios to reach more powerful machine intelligence.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 50 条
  • [1] Core Placement Optimization for Multi-chip Many-core Neural Network Systems with Reinforcement Learning
    Wu, Nan
    Deng, Lei
    Li, Guoqi
    Xie, Yuan
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2021, 26 (02)
  • [2] Core Interface Optimization for Multi-core Neuromorphic Processors
    Su, Zhe
    Hwang, Hyunjung
    Torchet, Tristan
    Indiveri, Giacomo
    2023 28TH IEEE INTERNATIONAL SYMPOSIUM ON ASYNCHRONOUS CIRCUITS AND SYSTEMS, ASYNC, 2023, : 89 - 98
  • [3] A Heterogeneous Multi-core Network-on-Chip Mapping Optimization Algorithm
    Fang, Juan
    Zhao, Haoyan
    Zhang, Jiayue
    Shi, Jiamei
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT I, 2022, 13155 : 370 - 384
  • [4] On-chip bus architecture optimization for multi-core SoC systems
    Lien, Cheng-Min
    Chen, Ya-Shu
    Shih, Chi-Sheng
    SOFTWARE TECHNOLOGIES FOR EMBEDDED AND UBIQUITOUS SYSTEMS, 2007, 4761 : 301 - +
  • [5] A Network-on-Chip for Radiation Tolerant, Multi-core FPGA Systems
    Hogan, Justin A.
    Weber, Raymond J.
    LaMeres, Brock J.
    2014 IEEE AEROSPACE CONFERENCE, 2014,
  • [6] Performance explorations of multi-core network on chip router
    Saravanakumar, U.
    Rangarajan, R.
    International Journal of Simulation: Systems, Science and Technology, 2012, 13 (01): : 36 - 42
  • [7] Redsharc: A Programming Model and On-Chip Network for Multi-Core Systems on a Programmable Chip
    Kritikos, WilliamV.
    Schmidt, Andrew G.
    Sass, Ron
    Anderson, Erik K.
    French, Matthew
    INTERNATIONAL JOURNAL OF RECONFIGURABLE COMPUTING, 2012, 2012
  • [8] SIFT implementation and optimization for multi-core systems
    Zhang, Qi
    Chen, Yurong
    Zhang, Yimin
    Xu, Yinlong
    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 127 - +
  • [9] Simulation Environment for Design and Verification of Network-on-Chip and Multi-core Systems
    Khan, Gul N.
    Dumitriu, Victor
    2009 IEEE INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS & SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS), 2009, : 403 - 411
  • [10] New on-chip interconnection network for multi-core processor
    Qiao, Bao-Jun
    Shi, Feng
    Ji, Wei-Xing
    Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2007, 27 (06): : 511 - 516