O3BNN: An Out-Of-Order Architecture for High-Performance Binarized Neural Network Inference with Fine-Grained Pruning

被引:14
|
作者
Geng, Tong [1 ,2 ]
Wang, Tianqi [1 ]
Wu, Chunshu [1 ]
Yang, Chen [1 ]
Wu, Wei [3 ]
Li, Ang [2 ]
Herbordt, Martin C. [1 ]
机构
[1] Boston Univ, Boston, MA 02215 USA
[2] Pacific Northwest Natl Lab, Richland, WA 99352 USA
[3] Los Alamos Natl Lab, Los Alamos, NM USA
关键词
Machine Learning; BNN; High-Performance Computing; Pruning; Out-of-Order Architecture;
D O I
10.1145/3330345.3330386
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Binarized Neural Networks (BNN) have drawn tremendous attention due to significantly reduced computational complexity and memory demand. They have especially shown great potential in cost- and power-restricted domains, such as IoT and smart edge-devices, where reaching a certain accuracy bar is often sufficient, and real-time is highly desired. In this work, we demonstrate that the highly-condensed BNN model can be shrunk significantly further by dynamically pruning irregular redundant edges. Based on two new observations on BNN-specific properties, an out-of-order (OoO) architecture - O3BNN, can curtail edge evaluation in cases where the binary output of a neuron can be determined early. Similar to Instruction-Level-Parallelism (ILP), these fine-grained, irregular, runtime pruning opportunities are traditionally presumed to be difficult to exploit. We evaluate our design on an FPGA platform using three well-known networks, including VggNet-16, AlexNet for ImageNet, and a VGG-like network for Cifar-10. Results show that the out-of-order approach can prune 27%, 16%, and 42% of the operations for the three networks respectively, without any accuracy loss, leading to at least 1.7x, 1.5x, and 2.1x speedups over state-of-the-art BNN implementations on FPGA/GPU/CPU. Since the approach is inference runtime pruning, no retraining or fine-tuning is needed. We demonstrate the design on an FPGA platform; however, this is only for showcasing the method: the approach does not rely on any FPGA-specific features and can thus be adopted by other devices as well.
引用
收藏
页码:461 / 472
页数:12
相关论文
共 10 条
  • [1] O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference
    Geng, Tong
    Li, Ang
    Wang, Tianqi
    Wu, Chunshu
    Li, Yanfei
    Shi, Runbin
    Wu, Wei
    Herbordt, Martin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (01) : 199 - 213
  • [2] Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-grained Pruning
    Fu, Keqi
    Qi, Zhi
    Cai, Jiaxuan
    Shi, Xulong
    2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,
  • [3] Out-of-Order Processing: A New Architecture for High-Performance Stream Systems
    Li, Jin
    Tufte, Kristin
    Shkapenyuk, Vladislav
    Papadimos, Vassilis
    Johnson, Theodore
    Maier, David
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01): : 274 - 288
  • [4] Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture
    Olgun, Ataberk
    Bostanci, F. Nisa
    de Oliveira Junior, Geraldo Francisco
    Tugrul, Yahya Can
    Ul Bera, Rah
    Yaglikci, Abdullah Giray
    Hassan, Hasan
    Ergin, Oguz
    Mutlu, Onur
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (03)
  • [5] Fine-Grained, Multi-Domain Network Resource Abstraction as a Fundamental Primitive to Enable High-Performance, Collaborative Data Sciences
    Xiang, Qiao
    Zhang, J. Jensen
    Wang, X. Tony
    Liu, Y. Jace
    Guok, Chin
    Le, Franck
    MacAuley, John
    Newman, Harvey
    Yang, Y. Richard
    SIGCOMM'18: PROCEEDINGS OF THE ACM SIGCOMM 2018 CONFERENCE: POSTERS AND DEMOS, 2018, : 27 - 29
  • [6] Fine-Grained, Multi-Domain Network Resource Abstraction as a Fundamental Primitive to Enable High-Performance, Collaborative Data Sciences
    Xiang, Qiao
    Zhang, J. Jensen
    Wang, X. Tony
    Liu, Y. Jace
    Guok, Chin
    Le, Franck
    MacAuley, John
    Newman, Harvey
    Yang, Y. Richard
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18), 2018,
  • [7] Designing high-performance & reliable superscalar architectures - The out of order reliable superscalar (O3RS) approach
    Mendelson, A
    Suri, N
    DSN 2000: INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2000, : 473 - 481
  • [8] Fine-grained high-performance Ba0.85Ca0.15 Zr0.1Ti0.9O3 piezoceramics obtained by current-controlled flash sintering of nanopowders
    Lopez-Blanco, Samuel
    Ochoa, Diego A.
    Amorin, Harvey
    Castro, Alicia
    Alguero, Miguel
    Garcia, Jose E.
    JOURNAL OF THE EUROPEAN CERAMIC SOCIETY, 2023, 43 (16) : 7440 - 7445
  • [9] High-performance solid state asymmetric supercapacitor based on electrochemically decorated 3D network-like Co3O4 architecture on NiO nanoworms
    Moradlou, Omran
    Ansarinejad, Hanieh
    Hosseinzadeh, Maryam
    Kazemi, Hojjat
    JOURNAL OF ALLOYS AND COMPOUNDS, 2018, 755 : 231 - 241