An efficient GPU-accelerated inference engine for binary neural network on mobile phones

被引:4
|
作者
He, Shengyu [1 ]
Meng, Haitao [1 ,3 ]
Zhou, Zhaoheng [1 ]
Liu, Yongjun [2 ]
Huang, Kai [1 ]
Chen, Gang [1 ]
机构
[1] Sun Yat Sen Univ, Guangzhou, Guangdong, Peoples R China
[2] Changshu Inst Technol, Suzhou, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
STEREO ESTIMATION;
D O I
10.1016/j.sysarc.2021.102156
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Over the last years, deep neural networks (DNNs) are becoming more powerful and have risen in popularity, especially in mobile computing. Applications running on edge AI devices such as smartphones would potentially benefit from the new opportunities enabled by deep learning techniques. However, DNNs are by nature computationally and memory intensive, making them challenging to deploy on mobile devices. Binary neural networks (BNNs) have been considered as a promising solution that can significantly reduce the memory and computational requirements of DNNs while still offering similar capabilities of full precision DNN models. Currently, existing GPU-accelerated implementations of BNNs are only tailored for desktop platforms. Due to architecture differences, mere porting of such implementations to mobile devices yields suboptimal performance or is impossible in some cases. Therefore, there has still been a missing piece in the literature for GPU-accelerated implementations of BNNs on mobile devices. In this paper, we propose PhoneBit, a GPU-accelerated BNN inference engine for mobile devices that fully exploits the computing power of BNNs on mobile GPUs. PhoneBit provides a set of operator-level optimizations including locality-friendly data layout, bit packing with vectorization and layers integration for efficient binary convolution. We also provide a detailed implementation and parallelization optimization for PhoneBit to optimally utilize the memory bandwidth and computing power of mobile GPUs. Our experiment results show that PhoneBit can achieve significant speedup and energy efficiency compared with state-of-the-art frameworks for mobile devices. The PhoneBit open source library is available for download at https://code.ihub.org.cn/projects/915/repository/PhoneBit.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] PhoneBit: Efficient GPU-Accelerated Binary Neural Network Inference Engine for Mobile Phones
    Chen, Gang
    Hen, Shengyu
    Meng, Haitao
    Huang, Kai
    [J]. PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 786 - 791
  • [2] GPU-Accelerated Real-Time Stereo Estimation With Binary Neural Network
    Chen, Gang
    Meng, Haitao
    Liang, Yucheng
    Huang, Kai
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (12) : 2896 - 2907
  • [3] A GPU-accelerated real-time human voice separation framework for mobile phones
    Chen, Gang
    Zheng, Yi
    Zhou, Zhaoheng
    He, Shengyu
    Yi, Wang
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 145
  • [4] GPU-accelerated artificial neural network potential for molecular dynamics simulation
    Zhang, Meng
    Hibi, Koki
    Inoue, Junya
    [J]. COMPUTER PHYSICS COMMUNICATIONS, 2023, 285
  • [5] Efficient Intranode Communication in GPU-Accelerated Systems
    Ji, Feng
    Aji, Ashwin M.
    Dinan, James
    Buntinas, Darius
    Balaji, Pavan
    Feng, Wu-chun
    Ma, Xiaosong
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 1838 - 1847
  • [6] A GPU-accelerated cortical neural network model for visually guided robot navigation
    Beyeler, Michael
    Oros, Nicolas
    Dutt, Nikil
    Krichmar, Jeffrey L.
    [J]. NEURAL NETWORKS, 2015, 72 : 75 - 87
  • [7] GPU-accelerated differential dependency network analysis
    Speyer, Gil
    Rodriguez, Juan J.
    Bencomo, Tomas
    Kim, Seungchan
    [J]. 2018 26TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2018), 2018, : 410 - 414
  • [8] GPU-Accelerated Neural Network Potential Energy Surfaces for Diffusion Monte Carlo
    DiRisio, Ryan J.
    Lu, Fenris
    McCoy, Anne B.
    [J]. JOURNAL OF PHYSICAL CHEMISTRY A, 2021, 125 (26): : 5849 - 5859
  • [9] Dynamic parallelism for synaptic updating in GPU-accelerated spiking neural network simulations
    Kasap, Bahadir
    van Opstal, A. John
    [J]. NEUROCOMPUTING, 2018, 302 : 55 - 65
  • [10] Fast and Low-Precision Learning in GPU-Accelerated Spiking Neural Network
    She, Xueyuan
    Long, Yun
    Mukhopadhyay, Saibal
    [J]. 2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 450 - 455