Cambricon: An Instruction Set Architecture for Neural Networks

被引:172
|
作者
Liu, Shaoli [1 ,4 ]
Du, Zidong [1 ,4 ]
Tao, Jinhua [1 ,4 ]
Han, Dong [1 ,4 ]
Luo, Tao [1 ,4 ]
Xie, Yuan [2 ]
Chent, Yunji [1 ,3 ]
Chent, Tianshi [1 ,3 ,4 ]
机构
[1] Chinese Acad Sci, ICT, State Key Lab Comp Architecture, Beijing, Peoples R China
[2] UCSB, Dept Elect & Comp Engn, Santa Barbara, CA USA
[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing, Peoples R China
[4] Cambricon Ltd, Beijing, Peoples R China
关键词
D O I
10.1109/ISCA.2016.42
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Neural Networks (NN) are a family of models for a broad range of emerging machine learning and pattern recondition applications. NN techniques are conventionally executed on general-purpose processors (such as CPU and GPGPU), which are usually not energy-efficient since they invest excessive hardware resources to flexibly support various workloads. Consequently, application-specific hardware accelerators for neural networks have been proposed recently to improve the energy-efficiency. However, such accelerators were designed for a small set of NN techniques sharing similar computational patterns, and they adopt complex and informative instructions (control signals) directly corresponding to high-level functional blocks of an NN (such as layers), or even an NN as a whole. Although straightforward and easy-to-implement for a limited set of similar NN techniques, the lack of agility in the instruction set prevents such accelerator designs from supporting a variety of different NN techniques with sufficient flexibility and efficiency. In this paper, we propose a novel domain-specific Instruction Set Architecture (ISA) for NN accelerators, called Cambricon, which is a load-store architecture that integrates scalar, vector, matrix, logical, data transfer, and control instructions, based on a comprehensive analysis of existing NN techniques. Our evaluation over a total of ten representative yet distinct NN techniques have demonstrated that Cambricon exhibits strong descriptive capacity over a broad range of NN techniques, and provides higher code density than general-purpose ISAs such as x86, MIPS, and GPGPU. Compared to the latest state-of-the-art NN accelerator design DaDianNao [5] (which can only accommodate 3 types of NN techniques), our Cambricon-based accelerator prototype implemented in TSMC 65nm technology incurs only negligible latency/power/area overheads, with a versatile coverage of 10 different NN benchmarks.
引用
收藏
页码:393 / 405
页数:13
相关论文
共 50 条
  • [1] An instruction systolic array architecture for neural networks
    Kane, AJ
    Evans, DJ
    [J]. INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 1996, 61 (1-2) : 63 - 89
  • [2] Cambricon-X: An Accelerator for Sparse Neural Networks
    Zhang, Shijin
    Du, Zidong
    Zhang, Lei
    Lan, Huiying
    Liu, Shaoli
    Li, Ling
    Guo, Qi
    Chen, Tianshi
    Chen, Yunji
    [J]. 2016 49TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2016,
  • [3] An Artificial Neural Network Processor With a Custom Instruction Set Architecture for Embedded Applications
    Valencia, Daniel
    Fard, Saeed Fouladi
    Alimohammad, Amir
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (12) : 5200 - 5210
  • [4] Loongson Instruction Set Architecture Technology
    Hu, Weiwu
    Wang, Wenxiang
    Wu, Ruiyang
    Wang, Huandong
    Zeng, Lu
    Xu, Chenghua
    Gao, Xiang
    Zhang, Fuxin
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (01): : 2 - 16
  • [5] DISTRIBUTED INSTRUCTION SET COMPUTER ARCHITECTURE
    WANG, L
    WU, CL
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1991, 40 (08) : 915 - 934
  • [6] An Instruction Set Architecture for Machine Learning
    Chen, Yunji
    Lan, Huiying
    Du, Zidong
    Liu, Shaoli
    Tao, Jinhua
    Han, Dong
    Luo, Tao
    Guo, Qi
    Li, Ling
    Xie, Yuan
    Chen, Tianshi
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2019, 36 (03):
  • [7] REDUCED INSTRUCTION SET COMPUTER ARCHITECTURE
    STALLINGS, W
    [J]. PROCEEDINGS OF THE IEEE, 1988, 76 (01) : 38 - 55
  • [8] Parallelism and the ARM instruction set architecture
    Goodacre, J
    Sloss, AN
    [J]. COMPUTER, 2005, 38 (07) : 42 - +
  • [9] THE POWERPC USER INSTRUCTION SET ARCHITECTURE
    DIEFENDORFF, K
    SILHA, E
    [J]. IEEE MICRO, 1994, 14 (05) : 30 - 41
  • [10] Unfavorable structural properties of the set of neural networks with fixed architecture
    Petersen, Philipp
    Raslan, Mones
    Voigtlaender, Felix
    [J]. 2019 13TH INTERNATIONAL CONFERENCE ON SAMPLING THEORY AND APPLICATIONS (SAMPTA), 2019,