FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator
被引:35
|
作者:
Yuan, Geng
论文数: 0引用数: 0
h-index: 0
机构:
Northeastern Univ, Boston, MA 02115 USANortheastern Univ, Boston, MA 02115 USA
Yuan, Geng
[1
]
Behnam, Payman
论文数: 0引用数: 0
h-index: 0
机构:
Georgia Inst Technol, Atlanta, GA 30332 USANortheastern Univ, Boston, MA 02115 USA
Behnam, Payman
[2
]
Li, Zhengang
论文数: 0引用数: 0
h-index: 0
机构:
Northeastern Univ, Boston, MA 02115 USANortheastern Univ, Boston, MA 02115 USA
Li, Zhengang
[1
]
Shafiee, Ali
论文数: 0引用数: 0
h-index: 0
机构:
Samsung, Seoul, South KoreaNortheastern Univ, Boston, MA 02115 USA
Shafiee, Ali
[3
]
Lin, Sheng
论文数: 0引用数: 0
h-index: 0
机构:
Northeastern Univ, Boston, MA 02115 USANortheastern Univ, Boston, MA 02115 USA
Lin, Sheng
[1
]
Ma, Xiaolong
论文数: 0引用数: 0
h-index: 0
机构:
Northeastern Univ, Boston, MA 02115 USANortheastern Univ, Boston, MA 02115 USA
Ma, Xiaolong
[1
]
Liu, Hang
论文数: 0引用数: 0
h-index: 0
机构:
Stevens Inst Technol, Hoboken, NJ 07030 USANortheastern Univ, Boston, MA 02115 USA
Liu, Hang
[4
]
Qian, Xuehai
论文数: 0引用数: 0
h-index: 0
机构:
Univ Southern Calif, Los Angeles, CA 90089 USANortheastern Univ, Boston, MA 02115 USA
Qian, Xuehai
[5
]
论文数: 引用数:
h-index:
机构:
Bojnordi, Mahdi Nazm
[6
]
Wang, Yanzhi
论文数: 0引用数: 0
h-index: 0
机构:
Northeastern Univ, Boston, MA 02115 USANortheastern Univ, Boston, MA 02115 USA
Wang, Yanzhi
[1
]
Ding, Caiwen
论文数: 0引用数: 0
h-index: 0
机构:
Univ Connecticut, Storrs, CT USANortheastern Univ, Boston, MA 02115 USA
Ding, Caiwen
[7
]
机构:
[1] Northeastern Univ, Boston, MA 02115 USA
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
[3] Samsung, Seoul, South Korea
[4] Stevens Inst Technol, Hoboken, NJ 07030 USA
[5] Univ Southern Calif, Los Angeles, CA 90089 USA
[6] Univ Utah, Salt Lake City, UT 84112 USA
[7] Univ Connecticut, Storrs, CT USA
来源:
2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021)
|
2021年
基金:
美国国家科学基金会;
关键词:
D O I:
10.1109/ISCA52012.2021.00029
中图分类号:
TP3 [计算技术、计算机技术];
学科分类号:
0812 ;
摘要:
Recent work demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication-the intensive and key computation in deep neural networks (DNNs). One key problem is the weights that are signed values. However, in a ReRAM crossbar, weights are stored as conductance of the crossbar cells, and the in-situ computation assumes all cells on each crossbar column are of the same sign. The current architectures either use two ReRAM crossbars for positive and negative weights (PRIME), or add an offset to weights so that all values become positive (ISAAC). Neither solution is ideal: they either double the cost of crossbars, or incur extra offset circuity. To better address this problem, we propose FORMS, a fine-grained ReRAM-based DNN accelerator with algorithm/hardware co-design. Instead of trying to represent the positive/negative weights, our key design principle is to enforce exactly what is assumed in the in-situ computation-ensuring that all weights in the same column of a crossbar have the same sign. It naturally avoids the cost of an additional crossbar. Such polarized weights can be nicely generated using alternating direction method of multipliers (ADMM) regularized optimization during the DNN training, which can exactly enforce certain patterns in DNN weights. To achieve high accuracy, we divide the crossbar into logical sub-arrays and only enforce this property within the fine-grained sub-array columns Crucially, the small sub-arrays provides a unique opportunity for input zero-skipping, which can significantly avoid unnecessary computations and reduce computation time. At the same time, it also makes the hardware much easier to implement and is less susceptible to non-idealities and noise than coarse-grained architectures. Putting all together, with the same optimized DNN models, FORMS achieves 1.50x and 1.93 x throughput improvement in terms of GOPs/epsilon x mm(2) and GOPs/W compared to ISAAC, and 1.12x similar to 2.4x speed up in terms of frame per second over optimized ISAAC with almost the same power/area cost. Interestingly, FORMS optimization framework can even speed up the original ISAAC from 10.7 x up to 377.9 x, reflecting the importance of software/hardware co-design optimizations.
机构:
Shanghai Jiao Tong Univ, Shanghai, Peoples R ChinaShanghai Jiao Tong Univ, Shanghai, Peoples R China
Song, Zhuoran
Li, Dongyue
论文数: 0引用数: 0
h-index: 0
机构:
Shanghai Qi Zhi Inst, Shanghai, Peoples R ChinaShanghai Jiao Tong Univ, Shanghai, Peoples R China
Li, Dongyue
He, Zhezhi
论文数: 0引用数: 0
h-index: 0
机构:
Shanghai Jiao Tong Univ, Shanghai, Peoples R ChinaShanghai Jiao Tong Univ, Shanghai, Peoples R China
He, Zhezhi
Liang, Xiaoyao
论文数: 0引用数: 0
h-index: 0
机构:
Shanghai Jiao Tong Univ, Shanghai, Peoples R ChinaShanghai Jiao Tong Univ, Shanghai, Peoples R China
Liang, Xiaoyao
Jiang, Li
论文数: 0引用数: 0
h-index: 0
机构:
Shanghai Jiao Tong Univ, Shanghai, Peoples R China
Shanghai Qi Zhi Inst, Shanghai, Peoples R China
Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R ChinaShanghai Jiao Tong Univ, Shanghai, Peoples R China
Jiang, Li
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS),
2021,
机构:
Korea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Daejeon 34141, South KoreaKorea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Daejeon 34141, South Korea
Kim, Hyeonuk
Jung, Youngbeom
论文数: 0引用数: 0
h-index: 0
机构:
Korea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Daejeon 34141, South KoreaKorea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Daejeon 34141, South Korea
Jung, Youngbeom
Kim, Lee-Sup
论文数: 0引用数: 0
h-index: 0
机构:
Korea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Daejeon 34141, South KoreaKorea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Daejeon 34141, South Korea