Network Group Partition and Core Placement Optimization for Neuromorphic Multi-Core and Multi-Chip Systems

被引:0
|
作者
Yang, Yukuan [1 ,2 ]
Fan, Qihang [3 ]
Yan, Tianyi [4 ]
Pei, Jing [3 ]
Li, Guoqi [5 ,6 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China
[2] Tsinghua Univ, Ctr Brain Inspired Comp Res, Dept Precis Instrument, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Precis Instrument, Beijing 100084, Peoples R China
[4] Beijing Inst Technol, Sch Life Sci, Beijing 100081, Peoples R China
[5] Chinese Acad Sci, Inst Automat, Beijing 100045, Peoples R China
[6] Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
关键词
Multicore processing; Optimization; System recovery; Throughput; Neuromorphics; Hardware; Costs; Network group partition; core placement optimization; neuromorphic chips; multi-core and multi-chip systems; CHIP;
D O I
10.1109/TETCI.2024.3379165
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neuromorphic chips with multi-core architecture are considered to be of great potential for the next generation of artificial intelligence (AI) chips because of the avoidance of the memory wall effect. Deploying deep neural networks (DNNs) to these chips requires two stages, namely, network partition and core placement. For the network partition, existing schemes are mostly manual or only focus on single-layer, small-scale network partitions. For the core placement, to the best of our knowledge, there is still no work that has completely solved the communication deadlock problem at the clock-level which commonly exists in the applications of neuromorphic multi-core and multi-chip (NMCMC) systems. To address these issues that affect the operating and deployment efficiency of NMCMC systems, we formulate the network group partition problem as an optimization problem for the first time and propose a search-based network group partition scheme to solve the problem. A clock-level multi-chip simulator is established to completely avoid the deadlock problem during the core placement optimization process. What's more, a region constrained simulated annealing (RCSA) algorithm is proposed to improve the efficiency of the core placement optimization. Finally, an automated toolchain for the efficient deployment of DNNs in the NMCMC systems is developed by integrating the proposed network group partition and core placement schemes together. Experiments show the proposed group partition scheme can achieve 22.25%, 17.77%, 14.80% less in core number, 9.44%, 7.96%, 5.16% improvements in memory utilization, and more balanced communication and computation loads compared with existing manual schemes in ResNet-18, ResNet-34, and ResNet-50, respectively. In addition, the proposed core placement optimization based on the RCSA algorithm shows higher efficiency with much fewer optimization steps and can realize 9.52%, 11.91%, and 27.52% higher in throughput compared with sequential core placement without deadlock in the ResNet-18, ResNet-34, and ResNet-50 networks. This work paves the way for applying NMCMC systems to real-world scenarios to reach more powerful machine intelligence.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 50 条
  • [41] A Review on 3D Network on Chip: Architecture Design and Optimization of Multi-core Media Applications
    Sridevi, S.
    Induinathi, G.
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2524 - 2527
  • [42] Towards Training Probabilistic Topic Models on Neuromorphic Multi-Chip Systems
    Xiao, Zihao
    Chen, Jianfei
    Zhu, Jun
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6459 - 6466
  • [43] Reconfigurable Network-on-Chip Design for Heterogeneous Multi-core System Architecture
    Shen, Jih-Sheng
    Hsiung, Pao-Ann
    Lu, Juin-Ming
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2014, : 523 - 526
  • [44] Hybrid WK-recursive on-chip network for multi-core system
    Hu, Sensen
    Shi, Feng
    Chen, Xu
    ELECTRONICS LETTERS, 2017, 53 (13) : 839 - 840
  • [45] A 2 GHz network-on-chip communication unit for multi-core microprocessors
    Zhou, H.-W. (hongw.zhou@gmail.com), 1600, Hunan University (40):
  • [46] Hybrid Mesh-Ring Wireless Network on Chip for Multi-Core System
    Abd El Ghany, Mohamed A.
    Wanas, Mohamed A.
    Zaki, Mohamed
    2012 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2012, : 167 - 170
  • [47] Application of Multi-core Parallel Computing in FPGA Placement
    Huang, Bohu
    Zhang, Haibin
    2013 2ND INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION AND MEASUREMENT, SENSOR NETWORK AND AUTOMATION (IMSNA), 2013, : 884 - 889
  • [48] Design and Chip Implementation of a Heterogeneous Multi-core DSP
    Chen, Shuming
    Chen, Xiaowen
    Xu, Yi
    Wan, Jianghua
    Lu, Jianzhuang
    Liu, Xiangyuan
    Chen, Shenggang
    2011 16TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2011,
  • [49] A Power-Efficient Network-on-Chip for Multi-core Stream Processors
    Jiang, Guoyue
    Wang, Fang
    Li, Zhaolin
    Wei, Shaojun
    2013 IEEE 10TH INTERNATIONAL CONFERENCE ON ASIC (ASICON), 2013,
  • [50] A parallelized implementation of turbo decoding based on network on chip multi-core processor
    Zhang, Chaolong
    Hu, Zhekun
    Chen, Jie
    Journal of Engineering Science and Technology Review, 2014, 7 (01) : 52 - 59