Cambricon: An Instruction Set Architecture for Neural Networks

被引：172

作者：

Liu, Shaoli ^{[1
,4
]}

Du, Zidong ^{[1
,4
]}

Tao, Jinhua ^{[1
,4
]}

Han, Dong ^{[1
,4
]}

Luo, Tao ^{[1
,4
]}

Xie, Yuan ^{[2
]}

Chent, Yunji ^{[1
,3
]}

Chent, Tianshi ^{[1
,3
,4
]}

机构：

[1] Chinese Acad Sci, ICT, State Key Lab Comp Architecture, Beijing, Peoples R China

[2] UCSB, Dept Elect & Comp Engn, Santa Barbara, CA USA

[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing, Peoples R China

[4] Cambricon Ltd, Beijing, Peoples R China

来源：

2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA) | 2016年

关键词：

D O I：

10.1109/ISCA.2016.42

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Neural Networks (NN) are a family of models for a broad range of emerging machine learning and pattern recondition applications. NN techniques are conventionally executed on general-purpose processors (such as CPU and GPGPU), which are usually not energy-efficient since they invest excessive hardware resources to flexibly support various workloads. Consequently, application-specific hardware accelerators for neural networks have been proposed recently to improve the energy-efficiency. However, such accelerators were designed for a small set of NN techniques sharing similar computational patterns, and they adopt complex and informative instructions (control signals) directly corresponding to high-level functional blocks of an NN (such as layers), or even an NN as a whole. Although straightforward and easy-to-implement for a limited set of similar NN techniques, the lack of agility in the instruction set prevents such accelerator designs from supporting a variety of different NN techniques with sufficient flexibility and efficiency. In this paper, we propose a novel domain-specific Instruction Set Architecture (ISA) for NN accelerators, called Cambricon, which is a load-store architecture that integrates scalar, vector, matrix, logical, data transfer, and control instructions, based on a comprehensive analysis of existing NN techniques. Our evaluation over a total of ten representative yet distinct NN techniques have demonstrated that Cambricon exhibits strong descriptive capacity over a broad range of NN techniques, and provides higher code density than general-purpose ISAs such as x86, MIPS, and GPGPU. Compared to the latest state-of-the-art NN accelerator design DaDianNao [5] (which can only accommodate 3 types of NN techniques), our Cambricon-based accelerator prototype implemented in TSMC 65nm technology incurs only negligible latency/power/area overheads, with a versatile coverage of 10 different NN benchmarks.

引用

页码：393 / 405

页数：13

共 50 条

[1] An instruction systolic array architecture for neural networks
Kane, AJ
Evans, DJ
[J]. INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 1996, 61 (1-2) : 63 - 89
[2] Cambricon-X: An Accelerator for Sparse Neural Networks
Zhang, Shijin
Du, Zidong
Zhang, Lei
Lan, Huiying
Liu, Shaoli
Li, Ling
Guo, Qi
Chen, Tianshi
Chen, Yunji
[J]. 2016 49TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2016,
[3] An Artificial Neural Network Processor With a Custom Instruction Set Architecture for Embedded Applications
Valencia, Daniel
Fard, Saeed Fouladi
Alimohammad, Amir
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (12) : 5200 - 5210
[4] Loongson Instruction Set Architecture Technology
Hu, Weiwu
Wang, Wenxiang
Wu, Ruiyang
Wang, Huandong
Zeng, Lu
Xu, Chenghua
Gao, Xiang
Zhang, Fuxin
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (01): : 2 - 16
[5] DISTRIBUTED INSTRUCTION SET COMPUTER ARCHITECTURE
WANG, L
WU, CL
[J]. IEEE TRANSACTIONS ON COMPUTERS, 1991, 40 (08) : 915 - 934
[6] An Instruction Set Architecture for Machine Learning
Chen, Yunji
Lan, Huiying
Du, Zidong
Liu, Shaoli
Tao, Jinhua
Han, Dong
Luo, Tao
Guo, Qi
Li, Ling
Xie, Yuan
Chen, Tianshi
[J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2019, 36 (03):
[7] REDUCED INSTRUCTION SET COMPUTER ARCHITECTURE
STALLINGS, W
[J]. PROCEEDINGS OF THE IEEE, 1988, 76 (01) : 38 - 55
[8] Parallelism and the ARM instruction set architecture
Goodacre, J
Sloss, AN
[J]. COMPUTER, 2005, 38 (07) : 42 - +
[9] THE POWERPC USER INSTRUCTION SET ARCHITECTURE
DIEFENDORFF, K
SILHA, E
[J]. IEEE MICRO, 1994, 14 (05) : 30 - 41
[10] Unfavorable structural properties of the set of neural networks with fixed architecture
Petersen, Philipp
Raslan, Mones
Voigtlaender, Felix
[J]. 2019 13TH INTERNATIONAL CONFERENCE ON SAMPLING THEORY AND APPLICATIONS (SAMPTA), 2019,

← 1 2 3 4 5 →