TCX: A Programmable Tensor Processor

被引:0
|
作者
Liang, Tailin [1 ,2 ]
Wang, Lei [1 ]
Shi, Shaobo [1 ,2 ]
Glossner, John [1 ,3 ]
Zhang, Xiaotong [1 ]
机构
[1] Univ Sci & Technol, Sch Comp Sci & Commun Engn, Beijing 100083, Peoples R China
[2] Hua Xia Gen Processor Technol, Beijing 100080, Peoples R China
[3] Gen Processor Technol, Tarrytown, NY 10591 USA
基金
国家重点研发计划;
关键词
Neural Network Accelerator; Convolutional Neural Network; ASIC Design; EFFICIENT; ACCELERATOR;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Neural network processors and accelerators are domain-specific architectures deployed to solve the high computational requirements of deep learning algorithms. This paper proposes a new instruction set extension for tensor computing, TCX, with RISC-style instructions and variable length tensor extensions. It features a multidimensional register file, dimension registers, and fully generic tensor instructions. It can be seamlessly integrated into existing RISC ISAs and provides software compatibility for scalable hardware implementations. We present an implementation of the TCX tensor computing accelerator using an out-of-order microarchitecture implementation. The tensor accelerator is scalable in computation units from several hundred to tens of thousands. An optimized register renaming mechanism is described which allows for many physical tensor registers without requiring architectural support for large tensor register names. We describe new tensor load and store instructions that reduce bandwidth requirements based on tensor dimensions. Implementations may balance data bandwidth and computation utilization for different types of tensor computations such as element-wise, depth-wise, and matrix-multiplication. We characterize the computation precision of tensor operations to balance area, generality, and accuracy loss for several well-known neural networks. The TCX processor runs at 1 GHz and sustains 8.2 Tera operations per second using a 4096 multiplication-accumulation compute unit with up to 98.83% MAC utilization. It consumes 12.8 square millimeters while dissipating 0.46 Watts per TOP in TSMC 28nm technology.
引用
收藏
页码:1023 / 1028
页数:6
相关论文
共 50 条
  • [41] Programmable Quantum Processor with Quantum Dot Qubits
    陈垚
    林佛良
    梁喜
    姜年权
    Chinese Physics Letters, 2019, (07) : 14 - 17
  • [42] Programmable DNA-Mediated Multitasking Processor
    Shu, Jian-Jun
    Wang, Qi-Wen
    Yong, Kian-Yan
    Shao, Fangwei
    Lee, Kee Jin
    JOURNAL OF PHYSICAL CHEMISTRY B, 2015, 119 (17): : 5639 - 5644
  • [43] A Programmable SRv6 Processor for SFC
    Liu, Zhongpei
    Lv, Gaofeng
    Wang, Jichang
    Yang, Xiangrui
    ELECTRONICS, 2022, 11 (18)
  • [44] A PROGRAMMABLE 1400 MOPS VIDEO SIGNAL PROCESSOR
    HUIZER, CM
    BAKER, K
    MEHTANI, R
    DEBLOCK, J
    DIJKSTRA, H
    HYNES, PJ
    LAMMERTS, JAM
    LECOUTERE, MM
    POPP, A
    VANROERMUND, AHM
    SHERIDAN, P
    SLUYTER, RJ
    WELTEN, FPJM
    PROCEEDINGS OF THE IEEE 1989 CUSTOM INTEGRATED CIRCUITS CONFERENCE, 1989, : 723 - 726
  • [45] Quantum computational advantage with a programmable photonic processor
    Madsen, Lars S.
    Laudenbach, Fabian
    Askarani, Mohsen Falamarzi.
    Rortais, Fabien
    Vincent, Trevor
    Bulmer, Jacob F. F.
    Miatto, Filippo M.
    Neuhaus, Leonhard
    Helt, Lukas G.
    Collins, Matthew J.
    Lita, Adriana E.
    Gerrits, Thomas
    Nam, Sae Woo
    Vaidya, Varun D.
    Menotti, Matteo
    Dhand, Ish
    Vernon, Zachary
    Quesada, Nicolas
    Lavoie, Jonathan
    NATURE, 2022, 606 (7912) : 75 - +
  • [46] A PROGRAMMABLE CMOS DUAL CHANNEL INTERFACE PROCESSOR
    AHUJA, BK
    BAXTER, WM
    ISSCC DIGEST OF TECHNICAL PAPERS, 1984, 27 : 232 - &
  • [47] ARCHITECTURE OF A LADDER SOLVING PROCESSOR FOR PROGRAMMABLE CONTROLLERS
    KIM, J
    PARK, J
    KWON, WH
    MICROPROCESSORS AND MICROSYSTEMS, 1992, 16 (07) : 369 - 379
  • [48] Programmable quantum processor implemented with superconducting circuit
    Nian-Quan Jiang
    Xi Liang
    Ming-Feng Wang
    CommunicationsinTheoreticalPhysics, 2021, 73 (05) : 35 - 39
  • [49] AN AUTOMATIC AND PROGRAMMABLE OPTICAL SAR DATA PROCESSOR
    Lin, Chaobo
    Wang, Huanglong
    Gao, Yesheng
    Wang, Kaizhi
    Liu, Xingzhao
    2015 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2015, : 3168 - 3171
  • [50] Patching processor design errors with programmable hardware
    Sarangi, Smruti
    Narayanasamy, Satish
    Carneal, Bruce
    Tiwari, Abhishek
    Calder, Brad
    Torrellas, Josep
    IEEE MICRO, 2007, 27 (01) : 12 - 25