O3BNN: An Out-Of-Order Architecture for High-Performance Binarized Neural Network Inference with Fine-Grained Pruning

被引：14

作者：

Geng, Tong ^{[1
,2
]}

Wang, Tianqi ^{[1
]}

Wu, Chunshu ^{[1
]}

Yang, Chen ^{[1
]}

Wu, Wei ^{[3
]}

Li, Ang ^{[2
]}

Herbordt, Martin C. ^{[1
]}

机构：

[1] Boston Univ, Boston, MA 02215 USA

[2] Pacific Northwest Natl Lab, Richland, WA 99352 USA

[3] Los Alamos Natl Lab, Los Alamos, NM USA

来源：

INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2019) | 2019年

关键词：

Machine Learning; BNN; High-Performance Computing; Pruning; Out-of-Order Architecture;

D O I：

10.1145/3330345.3330386

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Binarized Neural Networks (BNN) have drawn tremendous attention due to significantly reduced computational complexity and memory demand. They have especially shown great potential in cost- and power-restricted domains, such as IoT and smart edge-devices, where reaching a certain accuracy bar is often sufficient, and real-time is highly desired. In this work, we demonstrate that the highly-condensed BNN model can be shrunk significantly further by dynamically pruning irregular redundant edges. Based on two new observations on BNN-specific properties, an out-of-order (OoO) architecture - O3BNN, can curtail edge evaluation in cases where the binary output of a neuron can be determined early. Similar to Instruction-Level-Parallelism (ILP), these fine-grained, irregular, runtime pruning opportunities are traditionally presumed to be difficult to exploit. We evaluate our design on an FPGA platform using three well-known networks, including VggNet-16, AlexNet for ImageNet, and a VGG-like network for Cifar-10. Results show that the out-of-order approach can prune 27%, 16%, and 42% of the operations for the three networks respectively, without any accuracy loss, leading to at least 1.7x, 1.5x, and 2.1x speedups over state-of-the-art BNN implementations on FPGA/GPU/CPU. Since the approach is inference runtime pruning, no retraining or fine-tuning is needed. We demonstrate the design on an FPGA platform; however, this is only for showcasing the method: the approach does not rely on any FPGA-specific features and can thus be adopted by other devices as well.

引用

页码：461 / 472

页数：12

共 10 条

[1] O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference
Geng, Tong
Li, Ang
Wang, Tianqi
Wu, Chunshu
Li, Yanfei
Shi, Runbin
Wu, Wei
Herbordt, Martin
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (01) : 199 - 213
[2] Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-grained Pruning
Fu, Keqi
Qi, Zhi
Cai, Jiaxuan
Shi, Xulong
2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,
[3] Out-of-Order Processing: A New Architecture for High-Performance Stream Systems
Li, Jin
Tufte, Kristin
Shkapenyuk, Vladislav
Papadimos, Vassilis
Johnson, Theodore
Maier, David
PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01): : 274 - 288
[4] Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture
Olgun, Ataberk
Bostanci, F. Nisa
de Oliveira Junior, Geraldo Francisco
Tugrul, Yahya Can
Ul Bera, Rah
Yaglikci, Abdullah Giray
Hassan, Hasan
Ergin, Oguz
Mutlu, Onur
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (03)
[5] Fine-Grained, Multi-Domain Network Resource Abstraction as a Fundamental Primitive to Enable High-Performance, Collaborative Data Sciences
Xiang, Qiao
Zhang, J. Jensen
Wang, X. Tony
Liu, Y. Jace
Guok, Chin
Le, Franck
MacAuley, John
Newman, Harvey
Yang, Y. Richard
SIGCOMM'18: PROCEEDINGS OF THE ACM SIGCOMM 2018 CONFERENCE: POSTERS AND DEMOS, 2018, : 27 - 29
[6] Fine-Grained, Multi-Domain Network Resource Abstraction as a Fundamental Primitive to Enable High-Performance, Collaborative Data Sciences
Xiang, Qiao
Zhang, J. Jensen
Wang, X. Tony
Liu, Y. Jace
Guok, Chin
Le, Franck
MacAuley, John
Newman, Harvey
Yang, Y. Richard
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18), 2018,
[7] Designing high-performance & reliable superscalar architectures - The out of order reliable superscalar (O3RS) approach
Mendelson, A
Suri, N
DSN 2000: INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2000, : 473 - 481
[8] Fine-grained high-performance Ba0.85Ca0.15 Zr0.1Ti0.9O3 piezoceramics obtained by current-controlled flash sintering of nanopowders
Lopez-Blanco, Samuel
Ochoa, Diego A.
Amorin, Harvey
Castro, Alicia
Alguero, Miguel
Garcia, Jose E.
JOURNAL OF THE EUROPEAN CERAMIC SOCIETY, 2023, 43 (16) : 7440 - 7445
[9] High-performance solid state asymmetric supercapacitor based on electrochemically decorated 3D network-like Co3O4 architecture on NiO nanoworms
Moradlou, Omran
Ansarinejad, Hanieh
Hosseinzadeh, Maryam
Kazemi, Hojjat
JOURNAL OF ALLOYS AND COMPOUNDS, 2018, 755 : 231 - 241
[10] High-performance solid state asymmetric supercapacitor based on electrochemically decorated 3D network-like Co3O4 architecture on NiO nanoworms
Moradlou, Omran (moradlou@alzahra.ac.ir), 2018, Elsevier Ltd (755)

← 1 →