TCX: A Programmable Tensor Processor

被引:0
|
作者
Liang, Tailin [1 ,2 ]
Wang, Lei [1 ]
Shi, Shaobo [1 ,2 ]
Glossner, John [1 ,3 ]
Zhang, Xiaotong [1 ]
机构
[1] Univ Sci & Technol, Sch Comp Sci & Commun Engn, Beijing 100083, Peoples R China
[2] Hua Xia Gen Processor Technol, Beijing 100080, Peoples R China
[3] Gen Processor Technol, Tarrytown, NY 10591 USA
基金
国家重点研发计划;
关键词
Neural Network Accelerator; Convolutional Neural Network; ASIC Design; EFFICIENT; ACCELERATOR;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Neural network processors and accelerators are domain-specific architectures deployed to solve the high computational requirements of deep learning algorithms. This paper proposes a new instruction set extension for tensor computing, TCX, with RISC-style instructions and variable length tensor extensions. It features a multidimensional register file, dimension registers, and fully generic tensor instructions. It can be seamlessly integrated into existing RISC ISAs and provides software compatibility for scalable hardware implementations. We present an implementation of the TCX tensor computing accelerator using an out-of-order microarchitecture implementation. The tensor accelerator is scalable in computation units from several hundred to tens of thousands. An optimized register renaming mechanism is described which allows for many physical tensor registers without requiring architectural support for large tensor register names. We describe new tensor load and store instructions that reduce bandwidth requirements based on tensor dimensions. Implementations may balance data bandwidth and computation utilization for different types of tensor computations such as element-wise, depth-wise, and matrix-multiplication. We characterize the computation precision of tensor operations to balance area, generality, and accuracy loss for several well-known neural networks. The TCX processor runs at 1 GHz and sustains 8.2 Tera operations per second using a 4096 multiplication-accumulation compute unit with up to 98.83% MAC utilization. It consumes 12.8 square millimeters while dissipating 0.46 Watts per TOP in TSMC 28nm technology.
引用
收藏
页码:1023 / 1028
页数:6
相关论文
共 50 条
  • [31] A computational design of a programmable biological processor
    Moskon, Miha
    Pusnik, Ziga
    Stanovnik, Lidija
    Zimic, Nikolaj
    Mraz, Miha
    BIOSYSTEMS, 2022, 221
  • [32] A Project of Compiler for a Processor with Programmable Accelerator
    Steinberg, Boris Ya.
    Bugliy, Anton P.
    Dubrov, Denis V.
    Mikhailuts, Yury V.
    Steinberg, Oleg B.
    Steinberg, Roman B.
    5TH INTERNATIONAL YOUNG SCIENTIST CONFERENCE ON COMPUTATIONAL SCIENCE, YSC 2016, 2016, 101 : 435 - 438
  • [33] ARCHITECTURE OF A PROGRAMMABLE DIGITAL SIGNAL PROCESSOR
    SHIVELY, RR
    IEEE TRANSACTIONS ON COMPUTERS, 1982, 31 (01) : 16 - 22
  • [34] A programmable concurrent video signal processor
    Chen, CC
    Jen, CW
    INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, PROCEEDINGS - VOL II, 1996, : 1039 - 1042
  • [35] PROGRAMMABLE DIGITAL PROCESSOR FOR AIRBORNE RADAR
    MARTINSON, L
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 1975, 11 (03) : 422 - 422
  • [36] A PROGRAMMABLE ANALOG NEURAL NETWORK PROCESSOR
    FISHER, WA
    FUJIMOTO, RJ
    SMITHSON, RC
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 1991, 2 (02): : 222 - 229
  • [37] A programmable spatiotemporal image processor chip
    Gruev, V
    Etienne-Cummings, R
    ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL IV: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 325 - 328
  • [38] Active learning on a programmable photonic quantum processor
    Ding, Chen
    Xu, Xiao-Yue
    Niu, Yun-Fei
    Zhang, Shuo
    Huang, He-Liang
    Bao, Wan-Su
    QUANTUM SCIENCE AND TECHNOLOGY, 2023, 8 (03)
  • [39] A programmable qudit-based quantum processor
    Yulin Chi
    Jieshan Huang
    Zhanchuan Zhang
    Jun Mao
    Zinan Zhou
    Xiaojiong Chen
    Chonghao Zhai
    Jueming Bao
    Tianxiang Dai
    Huihong Yuan
    Ming Zhang
    Daoxin Dai
    Bo Tang
    Yan Yang
    Zhihua Li
    Yunhong Ding
    Leif K. Oxenløwe
    Mark G. Thompson
    Jeremy L. O’Brien
    Yan Li
    Qihuang Gong
    Jianwei Wang
    Nature Communications, 13
  • [40] THE CAB - A PROGRAMMABLE REAL-TIME PROCESSOR
    MATRICON, P
    MARBOT, R
    ONDE ELECTRIQUE, 1984, 64 (05): : 11 - 20