TCX: A Programmable Tensor Processor

被引:0
|
作者
Liang, Tailin [1 ,2 ]
Wang, Lei [1 ]
Shi, Shaobo [1 ,2 ]
Glossner, John [1 ,3 ]
Zhang, Xiaotong [1 ]
机构
[1] Univ Sci & Technol, Sch Comp Sci & Commun Engn, Beijing 100083, Peoples R China
[2] Hua Xia Gen Processor Technol, Beijing 100080, Peoples R China
[3] Gen Processor Technol, Tarrytown, NY 10591 USA
基金
国家重点研发计划;
关键词
Neural Network Accelerator; Convolutional Neural Network; ASIC Design; EFFICIENT; ACCELERATOR;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Neural network processors and accelerators are domain-specific architectures deployed to solve the high computational requirements of deep learning algorithms. This paper proposes a new instruction set extension for tensor computing, TCX, with RISC-style instructions and variable length tensor extensions. It features a multidimensional register file, dimension registers, and fully generic tensor instructions. It can be seamlessly integrated into existing RISC ISAs and provides software compatibility for scalable hardware implementations. We present an implementation of the TCX tensor computing accelerator using an out-of-order microarchitecture implementation. The tensor accelerator is scalable in computation units from several hundred to tens of thousands. An optimized register renaming mechanism is described which allows for many physical tensor registers without requiring architectural support for large tensor register names. We describe new tensor load and store instructions that reduce bandwidth requirements based on tensor dimensions. Implementations may balance data bandwidth and computation utilization for different types of tensor computations such as element-wise, depth-wise, and matrix-multiplication. We characterize the computation precision of tensor operations to balance area, generality, and accuracy loss for several well-known neural networks. The TCX processor runs at 1 GHz and sustains 8.2 Tera operations per second using a 4096 multiplication-accumulation compute unit with up to 98.83% MAC utilization. It consumes 12.8 square millimeters while dissipating 0.46 Watts per TOP in TSMC 28nm technology.
引用
收藏
页码:1023 / 1028
页数:6
相关论文
共 50 条
  • [1] TCX: A RISC Style Tensor Computing Extension and a Programmable Tensor Processor
    Liang, Tailin
    Wang, Lei
    Shi, Shaobo
    Glossner, John
    Zhang, Xiaotong
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2023, 22 (03)
  • [2] The programmable processor
    José Capmany
    Ivana Gasulla
    Daniel Pérez
    Nature Photonics, 2016, 10 : 6 - 8
  • [3] A programmable processor for cryptography
    Raghuram, SS
    Chakrabarti, C
    ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL V: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 685 - 688
  • [4] PROGRAMMABLE DIMUS PROCESSOR
    NICKLES, JC
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1972, 51 (01): : 134 - &
  • [5] A PROGRAMMABLE DEFLECTION PROCESSOR
    MURAKAMI, K
    MIYAZAKI, S
    TAMURA, T
    MURAYAMA, H
    MITO, Y
    SHIRAHAMA, A
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1991, 37 (03) : 544 - 554
  • [6] MICROWAVE PHOTONICS The programmable processor
    Capmany, Jose
    Gasulla, Ivana
    Perez, Daniel
    NATURE PHOTONICS, 2016, 10 (01) : 6 - 8
  • [7] A programmable CODEC signal processor
    Norsworthy, SR
    Bays, LE
    Fischer, J
    1996 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE, DIGEST OF TECHNICAL PAPERS, 1996, 39 : 170 - 171
  • [8] A CCD PROGRAMMABLE SIGNAL PROCESSOR
    CHIANG, AM
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 1990, 25 (06) : 1510 - 1517
  • [9] PRIMO - A PROGRAMMABLE ELECTROOPTIC PROCESSOR
    OWECHKO, Y
    SOFFER, BH
    TWENTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2: CONFERENCE RECORD, 1989, : 297 - 301
  • [10] PROGRAMMABLE COMMUNICATIONS PROCESSOR.
    Anon
    1985, (27):