An Instruction Set Architecture for Machine Learning

被引:7
|
作者
Chen, Yunji [1 ,2 ,3 ,4 ]
Lan, Huiying [1 ]
Du, Zidong [1 ]
Liu, Shaoli [1 ]
Tao, Jinhua [1 ]
Han, Dong [1 ]
Luo, Tao [1 ]
Guo, Qi [1 ]
Li, Ling [2 ,5 ]
Xie, Yuan [6 ]
Chen, Tianshi [1 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, SKL Comp Architecture, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] BIT, ZJLab, Inst BrainIntelligence Technol, Zhanjiang Lab, Beijing, Peoples R China
[4] Shanghai Res Ctr Brain Sci & Brain Inspired Intel, Shanghai, Peoples R China
[5] Chinese Acad Sci, Inst Software, Beijing, Peoples R China
[6] UCSB, Dept Elect & Comp Engn, Santa Barbara, CA USA
来源
ACM TRANSACTIONS ON COMPUTER SYSTEMS | 2019年 / 36卷 / 03期
基金
北京市自然科学基金;
关键词
NETWORK;
D O I
10.1145/3331469
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Machine Learning (ML) are a family of models for learning from the data to improve performance on a certain task. ML techniques, especially recent renewed neural networks (deep neural networks), have proven to be efficient for a broad range of applications. ML techniques are conventionally executed on general-purpose processors (such as CPU and GPGPU), which usually are not energy efficient, since they invest excessive hardware resources to flexibly support various workloads. Consequently, application-specific hardware accelerators have been proposed recently to improve energy efficiency. However, such accelerators were designed for a small set of ML techniques sharing similar computational patterns, and they adopt complex and informative instructions (control signals) directly corresponding to high-level functional blocks of an ML technique (such as layers in neural networks) or even an ML as a whole. Although straightforward and easy to implement for a limited set of similar ML techniques, the lack of agility in the instruction set prevents such accelerator designs from supporting a variety of different ML techniques with sufficient flexibility and efficiency. In this article, we first propose a novel domain-specific Instruction Set Architecture (ISA) for NN accelerators, called Cambricon, which is a load-store architecture that integrates scalar, vector, matrix, logical, data transfer, and control instructions, based on a comprehensive analysis of existing NN techniques. We then extend the application scope of Cambricon from NN to ML techniques. We also propose an assembly language, an assembler, and runtime to support programming with Cambricon, especially targeting large-scale ML problems. Our evaluation over a total of 16 representative yet distinct ML techniques have demonstrated that Cambricon exhibits strong descriptive capacity over a broad range of ML techniques and provides higher code density than general-purpose ISAs such as x86, MIPS, and GPGPU. Compared to the latest state-of-the-art NN accelerator design DaDianNao [7] (which can only accommodate three types of NN techniques), our Cambricon-based accelerator prototype implemented in TSMC 65nm technology incurs only negligible latency/power/area overheads, with a versatile coverage of 10 different NN benchmarks and 7 other ML benchmarks. Compared to the recent prevalent ML accelerator PuDianNao, our Cambricon-based accelerator is able to support all the ML techniques as well as the 10 NNs but with only approximate 5.1% performance loss.
引用
收藏
页数:35
相关论文
共 50 条
  • [1] Using EventB to Create a Virtual Machine Instruction Set Architecture
    Wright, Stephen
    [J]. ABSTRACT STATE MACHINES, B AND Z, PROCEEDINGS, 2008, 5238 : 265 - 279
  • [2] Application Specific Instruction-Set Processors for Machine Learning Applications
    Ali, Muhammad
    Goehringer, Diana
    [J]. 2022 21ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2022), 2022, : 274 - 277
  • [3] Loongson Instruction Set Architecture Technology
    Hu, Weiwu
    Wang, Wenxiang
    Wu, Ruiyang
    Wang, Huandong
    Zeng, Lu
    Xu, Chenghua
    Gao, Xiang
    Zhang, Fuxin
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (01): : 2 - 16
  • [4] DISTRIBUTED INSTRUCTION SET COMPUTER ARCHITECTURE
    WANG, L
    WU, CL
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1991, 40 (08) : 915 - 934
  • [5] REDUCED INSTRUCTION SET COMPUTER ARCHITECTURE
    STALLINGS, W
    [J]. PROCEEDINGS OF THE IEEE, 1988, 76 (01) : 38 - 55
  • [6] Parallelism and the ARM instruction set architecture
    Goodacre, J
    Sloss, AN
    [J]. COMPUTER, 2005, 38 (07) : 42 - +
  • [7] THE POWERPC USER INSTRUCTION SET ARCHITECTURE
    DIEFENDORFF, K
    SILHA, E
    [J]. IEEE MICRO, 1994, 14 (05) : 30 - 41
  • [8] Instruction set architecture to control instruction fetch on pipelined processors
    Okamoto, S
    Sowa, M
    [J]. 1997 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2: PACRIM 10 YEARS - 1987-1997, 1997, : 121 - 124
  • [9] A GHC ABSTRACT MACHINE AND INSTRUCTION SET
    LEVY, J
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1986, 225 : 157 - 171
  • [10] THE CLIPPER PROCESSOR - INSTRUCTION SET ARCHITECTURE AND IMPLEMENTATION
    HOLLINGSWORTH, W
    SACHS, H
    SMITH, AJ
    [J]. COMMUNICATIONS OF THE ACM, 1989, 32 (02) : 200 - 219