Network Group Partition and Core Placement Optimization for Neuromorphic Multi-Core and Multi-Chip Systems

被引：0

作者：

Yang, Yukuan ^{[1
,2
]}

Fan, Qihang ^{[3
]}

Yan, Tianyi ^{[4
]}

Pei, Jing ^{[3
]}

Li, Guoqi ^{[5
,6
]}

机构：

[1] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China

[2] Tsinghua Univ, Ctr Brain Inspired Comp Res, Dept Precis Instrument, Beijing 100084, Peoples R China

[3] Tsinghua Univ, Dept Precis Instrument, Beijing 100084, Peoples R China

[4] Beijing Inst Technol, Sch Life Sci, Beijing 100081, Peoples R China

[5] Chinese Acad Sci, Inst Automat, Beijing 100045, Peoples R China

[6] Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2024年 / 8卷 / 06期

关键词：

Multicore processing; Optimization; System recovery; Throughput; Neuromorphics; Hardware; Costs; Network group partition; core placement optimization; neuromorphic chips; multi-core and multi-chip systems; CHIP;

D O I：

10.1109/TETCI.2024.3379165

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Neuromorphic chips with multi-core architecture are considered to be of great potential for the next generation of artificial intelligence (AI) chips because of the avoidance of the memory wall effect. Deploying deep neural networks (DNNs) to these chips requires two stages, namely, network partition and core placement. For the network partition, existing schemes are mostly manual or only focus on single-layer, small-scale network partitions. For the core placement, to the best of our knowledge, there is still no work that has completely solved the communication deadlock problem at the clock-level which commonly exists in the applications of neuromorphic multi-core and multi-chip (NMCMC) systems. To address these issues that affect the operating and deployment efficiency of NMCMC systems, we formulate the network group partition problem as an optimization problem for the first time and propose a search-based network group partition scheme to solve the problem. A clock-level multi-chip simulator is established to completely avoid the deadlock problem during the core placement optimization process. What's more, a region constrained simulated annealing (RCSA) algorithm is proposed to improve the efficiency of the core placement optimization. Finally, an automated toolchain for the efficient deployment of DNNs in the NMCMC systems is developed by integrating the proposed network group partition and core placement schemes together. Experiments show the proposed group partition scheme can achieve 22.25%, 17.77%, 14.80% less in core number, 9.44%, 7.96%, 5.16% improvements in memory utilization, and more balanced communication and computation loads compared with existing manual schemes in ResNet-18, ResNet-34, and ResNet-50, respectively. In addition, the proposed core placement optimization based on the RCSA algorithm shows higher efficiency with much fewer optimization steps and can realize 9.52%, 11.91%, and 27.52% higher in throughput compared with sequential core placement without deadlock in the ResNet-18, ResNet-34, and ResNet-50 networks. This work paves the way for applying NMCMC systems to real-world scenarios to reach more powerful machine intelligence.

引用

页码：1 / 16

页数：16

共 50 条

[1] Core Placement Optimization for Multi-chip Many-core Neural Network Systems with Reinforcement Learning
Wu, Nan
Deng, Lei
Li, Guoqi
Xie, Yuan
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2021, 26 (02)
[2] Core Interface Optimization for Multi-core Neuromorphic Processors
Su, Zhe
Hwang, Hyunjung
Torchet, Tristan
Indiveri, Giacomo
2023 28TH IEEE INTERNATIONAL SYMPOSIUM ON ASYNCHRONOUS CIRCUITS AND SYSTEMS, ASYNC, 2023, : 89 - 98
[3] A Heterogeneous Multi-core Network-on-Chip Mapping Optimization Algorithm
Fang, Juan
Zhao, Haoyan
Zhang, Jiayue
Shi, Jiamei
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT I, 2022, 13155 : 370 - 384
[4] On-chip bus architecture optimization for multi-core SoC systems
Lien, Cheng-Min
Chen, Ya-Shu
Shih, Chi-Sheng
SOFTWARE TECHNOLOGIES FOR EMBEDDED AND UBIQUITOUS SYSTEMS, 2007, 4761 : 301 - +
[5] A Network-on-Chip for Radiation Tolerant, Multi-core FPGA Systems
Hogan, Justin A.
Weber, Raymond J.
LaMeres, Brock J.
2014 IEEE AEROSPACE CONFERENCE, 2014,
[6] Performance explorations of multi-core network on chip router
Saravanakumar, U.
Rangarajan, R.
International Journal of Simulation: Systems, Science and Technology, 2012, 13 (01): : 36 - 42
[7] Redsharc: A Programming Model and On-Chip Network for Multi-Core Systems on a Programmable Chip
Kritikos, WilliamV.
Schmidt, Andrew G.
Sass, Ron
Anderson, Erik K.
French, Matthew
INTERNATIONAL JOURNAL OF RECONFIGURABLE COMPUTING, 2012, 2012
[8] SIFT implementation and optimization for multi-core systems
Zhang, Qi
Chen, Yurong
Zhang, Yimin
Xu, Yinlong
2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 127 - +
[9] Simulation Environment for Design and Verification of Network-on-Chip and Multi-core Systems
Khan, Gul N.
Dumitriu, Victor
2009 IEEE INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS & SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS), 2009, : 403 - 411
[10] New on-chip interconnection network for multi-core processor
Qiao, Bao-Jun
Shi, Feng
Ji, Wei-Xing
Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2007, 27 (06): : 511 - 516

← 1 2 3 4 5 →