Reconfigurable Bit-Serial Operation Using Toggle SOT-MRAM for High-Performance Computing in Memory Architecture

被引:15
|
作者
Wang, Jinkai [1 ,2 ]
Bai, Yining [3 ]
Wang, Hongyu [3 ]
Hao, Zuolei [3 ]
Wang, Guanda [3 ]
Zhang, Kun [3 ]
Zhang, Youguang [3 ]
Lv, Weifeng [4 ,5 ]
Zhang, Yue [3 ,6 ]
机构
[1] Beihang Univ, Fert Beijing Inst, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[2] Beihang Univ, Fert Beijing Inst, Sch Comp Sci & Engn, MIIT Key Lab Spintron, Beijing 100191, Peoples R China
[3] Beihang Univ, Fert Beijing Inst, Sch Integrated Circuit Sci & Engn, MIIT Key Lab Spintron, Beijing 100191, Peoples R China
[4] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[5] Beihang Univ, Res Inst, Shenzhen Key Lab Data Vitalizat Smart City, Shenzhen 518057, Peoples R China
[6] Beihang Univ, Hefei Innovat Res Inst, Nanoelect Sci & Technol Ctr, Hefei 230013, Peoples R China
基金
中国国家自然科学基金;
关键词
Computing in memory; bit-serial operation; toggle spin-orbit torque MRAM; convolution operation; digital CIM architectures; UNIT-MACRO; SRAM; EFFICIENT; ENERGY; COMPUTATION; ENGINE;
D O I
10.1109/TCSI.2022.3192165
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Computing in memory (CIM) is a promising candidate for high throughput and energy-efficient data-driven applications, which mitigates the well-known memory bottleneck in Von Neumann architecture. In this paper, we present a reconfigurable bit-serial operation using toggle spin-orbit torque magnetic random access memory (TSOT-MRAM) to perform the computation completely in the bit-cell array instead of in a peripheral circuit. This bit-serial CIM (BSCIM) scheme achieves higher throughput and energy efficiency in CIM. First, basic Boolean logic operations are realized by utilizing the feature of TSOT device. A bit-cell array that implements the bit-serial operation is then built to provide the communication between column and row necessary for arithmetic operations, such as the carry propagation of addition and multiplication. Finally, we analyze the reliability of BSCIM scheme and demonstrate the performance advantage by performing convolution operations for 28 x 28 handwritten digit images in a BSCIM architecture. The results show that the delay and energy of BSCIM architecture are respectively reduced by 1.16-5.49 times and 1.12-1.43 times compared with the existing digital CIM architectures. Besides, its throughput and energy efficiency are also enhanced to 51.2 GOPS and 9.9 TOPS/W respectively.
引用
收藏
页码:4535 / 4545
页数:11
相关论文
共 50 条
  • [21] High-performance Sum Operation with Charge Saving and Sharing Circuit for MRAM-based In-memory Computing
    Yu, Jangseok
    Lee, Geonwoo
    Na, Taehui
    [J]. JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, 2024, 24 (02)
  • [22] High-Performance Computing-in-Memory Architecture Based on Single-Level and Multilevel Cell Differential Spin Hall MRAM
    Prajapati, Sanjay
    Nehra, Vikas
    Kaushik, Brajesh Kumar
    [J]. IEEE TRANSACTIONS ON MAGNETICS, 2021, 57 (09)
  • [23] A New Application-Tuned Processor Architecture for High-Performance Reconfigurable Computing
    Shang, Li-Hong
    Zhou, Mi
    Zhang, Jiong
    Li, Hong-Bin
    [J]. PROCEEDINGS OF THE 2009 NASA/ESA CONFERENCE ON ADAPTIVE HARDWARE AND SYSTEMS, 2009, : 138 - 143
  • [24] High-Performance Architecture Using Fast Dynamic Reconfigurable Accelerators
    Yang, Ping-Lin
    Marek-Sadowska, Malgorzata
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (07) : 1209 - 1222
  • [25] 250 MHz correlation using high-performance reconfigurable computing engines
    VonHerzen, B
    [J]. HIGH-SPEED COMPUTING, DIGITAL SIGNAL PROCESSING, AND FILTERING USING RECONFIGURABLE LOGIC, 1996, 2914 : 34 - 43
  • [26] A Multicore Architecture for High-Performance Scientific Computing using FPGAs
    Cobos Carrascosa, J. P.
    Aparicio del Moral, B.
    Ramos, J. L.
    Lopez Jimenez, A. C.
    del Toro Iniesta, J. C.
    [J]. 2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC), 2014, : 223 - 228
  • [27] A high-performance VLSI architecture for reconfigurable FIR using distributed arithmetic
    Mohanty, Basant Kumar
    Meher, Pramod Kumar
    Singhal, Subodh Kumar
    Swamy, M. N. S.
    [J]. INTEGRATION-THE VLSI JOURNAL, 2016, 54 : 37 - 46
  • [28] High-Performance STT-MRAM-Based Computing-in-Memory Scheme Utilizing Data Read Feature
    Wu, Bi
    Liu, Kai
    Yu, Tianyang
    Zhu, Haonan
    Chen, Ke
    Yan, Chenggang
    Deng, Erya
    Liu, Weiqiang
    [J]. IEEE TRANSACTIONS ON NANOTECHNOLOGY, 2023, 22 : 817 - 826
  • [29] High-Performance Multiclass Classification Framework Using Cloud Computing Architecture
    Lin, Feng-Sheng
    Shen, Chia-Ping
    Liu, Chia-Hung
    Lin, Han
    Huang, Chi-Ying F.
    Kao, Cheng-Yan
    Lai, Feipei
    Lin, Jeng-Wei
    [J]. JOURNAL OF MEDICAL AND BIOLOGICAL ENGINEERING, 2015, 35 (06) : 795 - 802
  • [30] High-Performance Multiclass Classification Framework Using Cloud Computing Architecture
    Feng-Sheng Lin
    Chia-Ping Shen
    Chia-Hung Liu
    Han Lin
    Chi-Ying F. Huang
    Cheng-Yan Kao
    Feipei Lai
    Jeng-Wei Lin
    [J]. Journal of Medical and Biological Engineering, 2015, 35 : 795 - 802