An efficient GPU-accelerated inference engine for binary neural network on mobile phones

被引:4
|
作者
He, Shengyu [1 ]
Meng, Haitao [1 ,3 ]
Zhou, Zhaoheng [1 ]
Liu, Yongjun [2 ]
Huang, Kai [1 ]
Chen, Gang [1 ]
机构
[1] Sun Yat Sen Univ, Guangzhou, Guangdong, Peoples R China
[2] Changshu Inst Technol, Suzhou, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
STEREO ESTIMATION;
D O I
10.1016/j.sysarc.2021.102156
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Over the last years, deep neural networks (DNNs) are becoming more powerful and have risen in popularity, especially in mobile computing. Applications running on edge AI devices such as smartphones would potentially benefit from the new opportunities enabled by deep learning techniques. However, DNNs are by nature computationally and memory intensive, making them challenging to deploy on mobile devices. Binary neural networks (BNNs) have been considered as a promising solution that can significantly reduce the memory and computational requirements of DNNs while still offering similar capabilities of full precision DNN models. Currently, existing GPU-accelerated implementations of BNNs are only tailored for desktop platforms. Due to architecture differences, mere porting of such implementations to mobile devices yields suboptimal performance or is impossible in some cases. Therefore, there has still been a missing piece in the literature for GPU-accelerated implementations of BNNs on mobile devices. In this paper, we propose PhoneBit, a GPU-accelerated BNN inference engine for mobile devices that fully exploits the computing power of BNNs on mobile GPUs. PhoneBit provides a set of operator-level optimizations including locality-friendly data layout, bit packing with vectorization and layers integration for efficient binary convolution. We also provide a detailed implementation and parallelization optimization for PhoneBit to optimally utilize the memory bandwidth and computing power of mobile GPUs. Our experiment results show that PhoneBit can achieve significant speedup and energy efficiency compared with state-of-the-art frameworks for mobile devices. The PhoneBit open source library is available for download at https://code.ihub.org.cn/projects/915/repository/PhoneBit.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Efficient MPI-based Communication for GPU-Accelerated Dask Applications
    Shafi, Aamir
    Hashmi, Jahanzeb Maqbool
    Subramoni, Hari
    Panda, Dhabaleswar K.
    [J]. 21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 277 - 286
  • [42] GPU-Accelerated and Efficient Multi-View Triangulation for Scene Reconstruction
    Mak, Jason
    Hess-Flores, Mauricio
    Recker, Shawn
    Owens, John D.
    Joy, Kenneth I.
    [J]. 2014 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2014, : 61 - 68
  • [43] GPU-accelerated phase-field simulation of dendritic solidification in a binary alloy
    Yamanaka, Akinori
    Aoki, Takayuki
    Ogawa, Satoi
    Takaki, Tomohiro
    [J]. JOURNAL OF CRYSTAL GROWTH, 2011, 318 (01) : 40 - 45
  • [44] A GPU-Accelerated Monte Carlo Dose Computation Engine for Precision Small Animal Radiotherapy
    Liu, Z.
    Yang, Y.
    [J]. MEDICAL PHYSICS, 2022, 49 (06) : E494 - E494
  • [45] A GPU-Accelerated Monte Carlo Engine for Calculation of MLC-Collimated Electron Fields
    Brost, E.
    Tseung, H. Wan Chan
    Antolak, J.
    [J]. MEDICAL PHYSICS, 2021, 48 (06)
  • [46] An Efficient Approach of GPU-accelerated Stochastic Gradient Descent Method for Matrix Factorization
    Li, Feng
    Ye, Yunming
    Li, Xutao
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2019, 20 (04): : 1087 - 1097
  • [47] Efficient GPU-accelerated thermomechanical solver for residual stress prediction in additive manufacturing
    Liao, Shuheng
    Golgoon, Ashkan
    Mozaffar, Mojtaba
    Cao, Jian
    [J]. COMPUTATIONAL MECHANICS, 2023, 71 (05) : 879 - 893
  • [48] Intrusion Detection Systems with GPU-Accelerated Deep Neural Networks and Effect of the Depth
    Reis, Buminhan
    Kaya, Sami Berk
    Karatas, Gozde
    Sahingoz, Ozgur Koray
    [J]. 2018 6TH INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING & INFORMATION TECHNOLOGY (CEIT), 2018,
  • [49] GPU-Accelerated Multivariate Empirical Mode Decomposition for Massive Neural Data Processing
    Mujahid, Taha
    Rahman, Anis Ur
    Khan, Muhammad Murtaza
    [J]. IEEE ACCESS, 2017, 5 : 8691 - 8701
  • [50] An efficient fine-grained parallel genetic algorithm based on GPU-accelerated
    Li, Jian-Ming
    Wang, Xiao-Jing
    He, Rong-Sheng
    Chi, Zhong-Xian
    [J]. 2007 IFIP INTERNATIONAL CONFERENCE ON NETWORK AND PARALLEL COMPUTING WORKSHOPS, PROCEEDINGS, 2007, : 855 - +