An Integer-Floating-Point Dual-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Chips

被引:1
|
作者
Wu, Ping-Chun [1 ]
Khwa, Win-San [2 ]
Wu, Jui-Jen [2 ]
Su, Jian-Wei [3 ]
Jhang, Chuan-Jia [4 ]
Chen, Ho-Yu [1 ]
Ke, Zhao-En [4 ]
Chiu, Ting-Chien [4 ]
Hsu, Jun-Ming [4 ]
Cheng, Chiao-Yen [5 ]
Chen, Yu-Chen [1 ]
Lo, Chung-Chuan [6 ]
Liu, Ren-Shuo [4 ]
Hsieh, Chih-Cheng [4 ]
Tang, Kea-Tiong [4 ]
Chang, Meng-Fan [2 ,4 ,7 ]
机构
[1] Natl Tsing Hua Univ NTHU, Inst Elect Engn, Hsinchu 30013, Taiwan
[2] Taiwan Semicond Mfg Co TSMC, Hsinchu 30075, Taiwan
[3] Ind Technol Res Inst ITRI, Hsinchu 310401, Taiwan
[4] Natl Tsing Hua Univ NTHU, Dept Elect Engn, Hsinchu 30013, Taiwan
[5] Natl Tsing Hua Univ NTH, Coll Semicond Res, Hsinchu 30013, Taiwan
[6] Natl Tsing Hua Univ NTHU, Inst Syst Neurosci, Hsinchu 30013, Taiwan
[7] Natl Tsing Hua Univ NTHU, Hsinchu 30013, Taiwan
关键词
Artificial intelligence (AI); computing-in-memory (CIM); gain cell (GC); inference; multiply-and-accumulate (MAC); SRAM MACRO; UNIT-MACRO; COMPUTATION;
D O I
10.1109/JSSC.2024.3470215
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This article presents a novel integer-floating-point (INT-FP) gain-cell (GC)-computing-in-memory (CIM) structure for high-precision multiply-and-accumulate (MAC) operations with high computational flexibility, energy efficiency, and inference accuracy. The proposed device employs: 1) a dual-mode zone-based input processing scheme (ZB-IPS) aimed at eliminating exponent subtraction in order to enhance energy and area efficiency (AEF); 2) a dual-mode local computing cell (DM-LCC) to reuse exponent addition as an adder tree stage for INT-MAC to enhance AEF in both INT and floating-point (FP) modes; and 3) a stationary-based two-port GC array (SB-TP-GCA) to enable concurrent data updates and computation while reducing system-to-CIM and internal data accesses to improve energy efficiency. A 16-nm FinFET 108-kb GC-CIM macro fabricated using 4T gain cells (GCs) achieved energy efficiency of 99.5 TOPS/W in INT-MAC operations involving 128 accumulations of 8b-input, 8b-weight, and 23b-output; and 46.4 TFLOPS/W in FP-MAC operations involving 64 accumulations of BF16-input, BF16-weight, and FP32-output.
引用
收藏
页码:158 / 170
页数:13
相关论文
共 11 条
  • [1] A Local Computing Cell and 6T SRAM-Based Computing-in-Memory Macro With 8-b MAC Operation for Edge AI Chips
    Si, Xin
    Tu, Yung-Ning
    Huang, Wei-Hsing
    Su, Jian-Wei
    Lu, Pei-Jung
    Wang, Jing-Hong
    Liu, Ta-Wei
    Wu, Ssu-Yen
    Liu, Ruhui
    Chou, Yen-Chi
    Chung, Yen-Lin
    Shih, William
    Lo, Chung-Chuan
    Liu, Ren-Shuo
    Hsieh, Chih-Cheng
    Tang, Kea-Tiong
    Lien, Nan-Chun
    Shih, Wei-Chiang
    He, Yajuan
    Li, Qiang
    Chang, Meng-Fan
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2021, 56 (09) : 2817 - 2831
  • [2] Two-Way Transpose Multibit 6T SRAM Computing-in-Memory Macro for Inference-Training AI Edge Chips
    Su, Jian-Wei
    Si, Xin
    Chou, Yen-Chi
    Chang, Ting-Wei
    Huang, Wei-Hsing
    Tu, Yung-Ning
    Liu, Ruhui
    Lu, Pei-Jung
    Liu, Ta-Wei
    Wang, Jing-Hong
    Chung, Yen-Lin
    Ren, Jin-Sheng
    Chang, Fu-Chun
    Wu, Yuan
    Jiang, Hongwu
    Huang, Shanshi
    Li, Sih-Han
    Sheu, Shyh-Shyuan
    Wu, Chih-, I
    Lo, Chung-Chuan
    Liu, Ren-Shuo
    Hsieh, Chih-Cheng
    Tang, Kea-Tiong
    Yu, Shimeng
    Chang, Meng-Fan
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2022, 57 (02) : 609 - 624
  • [3] A 129.83 TOPS/W Area Efficient Digital SOT/STT MRAM-Based Computing-In-Memory for Advanced Edge AI Chips
    Lu, Lu
    Mani, Aarthy
    Anh Tuan Do
    2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [4] A Floating-Point 6T SRAM In-Memory-Compute Macro Using Hybrid-Domain Structure for Advanced AI Edge Chips
    Wu, Ping-Chun
    Su, Jian-Wei
    Hong, Li-Yang
    Ren, Jin-Sheng
    Chien, Chih-Han
    Chen, Ho-Yu
    Ke, Chao-En
    Hsiao, Hsu-Ming
    Li, Sih-Han
    Sheu, Shyh-Shyuan
    Lo, Wei-Chung
    Chang, Shih-Chieh
    Lo, Chung-Chuan
    Liu, Ren-Shuo
    Hsieh, Chih-Cheng
    Tang, Kea-Tiong
    Chang, Meng-Fan
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2024, 59 (01) : 196 - 207
  • [5] An 8b-Precision 6T SRAM Computing-in-Memory Macro Using Time-Domain Incremental Accumulation for AI Edge Chips
    Wu, Ping-Chun
    Su, Jian-Wei
    Chung, Yen-Lin
    Hong, Li-Yang
    Ren, Jin-Sheng
    Chang, Fu-Chun
    Wu, Yuan
    Chen, Ho-Yu
    Lin, Chen-Hsun
    Hsiao, Hsu-Ming
    Li, Sih-Han
    Sheu, Shyh-Shyuan
    Chang, Shih-Chieh
    Lo, Wei-Chung
    Wu, Chih-, I
    Lo, Chung-Chuan
    Liu, Ren-Shuo
    Hsieh, Chih-Cheng
    Tang, Kea-Tiong
    Chang, Meng-Fan
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2024, 59 (07) : 2297 - 2309
  • [6] A 16Mb Dual-Mode ReRAM Macro with Sub-14ns Computing-In-Memory and Memory Functions Enabled by Self-Write Termination Scheme
    Chen, Wei-Hao
    Lin, Wen-Jang
    Lai, Li-Ya
    Li, Shuangchen
    Hsu, Chien-Hua
    Lin, Huan-Ting
    Lee, Heng-Yuan
    Su, Jian-Wei
    Xie, Yuan
    Sheu, Shyh-Shyuan
    Chang, Meng-Fan
    2017 IEEE INTERNATIONAL ELECTRON DEVICES MEETING (IEDM), 2017,
  • [7] A Computing-in-Memory Engine Supporting One-Shot Floating-Point NN Inference and On-Device Fine-Tuning for Edge AI
    Diao, Haikang
    Luo, Haoyang
    Song, Jiahao
    Xu, Bocheng
    Wang, Runsheng
    Wang, Yuan
    Tang, Xiyuan
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2025,
  • [8] 15.5 A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips
    Si, Xin
    Tu, Yung-Ning
    Huang, Wei-Hsing
    Su, Jian-Wei
    Lu, Pei-Jung
    Wang, Jing-Hong
    Liu, Ta-Wei
    Wu, Ssu-Yen
    Liu, Ruhui
    Chou, Yen-Chi
    Zhang, Zhixiao
    Sie, Syuan-Hao
    Wei, Wei-Chen
    Lo, Yun-Chen
    Wen, Tai-Hsing
    Hsu, Tzu-Hsiang
    Chen, Yen-Kai
    Shih, William
    Lo, Chung-Chuan
    Liu, Ren-Shuo
    Hsieh, Chih-Cheng
    Tang, Kea-Tiong
    Lien, Nan-Chun
    Shih, Wei-Chiang
    He, Yajuan
    Li, Qiang
    Chang, Meng-Fan
    2020 IEEE INTERNATIONAL SOLID- STATE CIRCUITS CONFERENCE (ISSCC), 2020, : 246 - +
  • [9] A 8-h-Precision 6T SRAM Computing-in-Memory Macro Using Segmented-Bitline Charge-Sharing Scheme for AI Edge Chips
    Su, Jian-Wei
    Chou, Yen-Chi
    Liu, Ruhui
    Liu, Ta-Wei
    Lu, Pei-Jung
    Wu, Ping-Chun
    Chung, Yen-Lin
    Hong, Li-Yang
    Ren, Jin-Sheng
    Pan, Tianlong
    Jhang, Chuan-Jia
    Huang, Wei-Hsing
    Chien, Chih-Han
    Mei, Peng-, I
    Li, Sih-Han
    Sheu, Shyh-Shyuan
    Chang, Shih-Chieh
    Lo, Wei-Chung
    Wu, Chih-, I
    Si, Xin
    Lo, Chung-Chuan
    Liu, Ren-Shuo
    Hsieh, Chih-Cheng
    Tang, Kea-Tiong
    Chang, Meng-Fan
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2023, 58 (03) : 877 - 892
  • [10] A 28-nm RRAM Computing-in-Memory Macro Using Weighted Hybrid 2T1R Cell Array and Reference Subtracting Sense Amplifier for AI Edge Inference
    Ye, Wang
    Wang, Linfang
    Zhou, Zhidao
    An, Junjie
    Li, Weizeng
    Gao, Hanghang
    Li, Zhi
    Yue, Jinshan
    Hu, Hongyang
    Xu, Xiaoxin
    Yang, Jianguo
    Liu, Jing
    Shang, Dashan
    Zhang, Feng
    Tian, Jinghui
    Dou, Chunmeng
    Liu, Qi
    Liu, Ming
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2023, 58 (10) : 2839 - 2850