An efficient GPU-accelerated inference engine for binary neural network on mobile phones

被引：4

作者：

He, Shengyu ^{[1
]}

Meng, Haitao ^{[1
,3
]}

Zhou, Zhaoheng ^{[1
]}

Liu, Yongjun ^{[2
]}

Huang, Kai ^{[1
]}

Chen, Gang ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Guangzhou, Guangdong, Peoples R China

[2] Changshu Inst Technol, Suzhou, Peoples R China

[3] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

JOURNAL OF SYSTEMS ARCHITECTURE | 2021年 / 117卷

基金：

中国国家自然科学基金;

关键词：

STEREO ESTIMATION;

D O I：

10.1016/j.sysarc.2021.102156

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Over the last years, deep neural networks (DNNs) are becoming more powerful and have risen in popularity, especially in mobile computing. Applications running on edge AI devices such as smartphones would potentially benefit from the new opportunities enabled by deep learning techniques. However, DNNs are by nature computationally and memory intensive, making them challenging to deploy on mobile devices. Binary neural networks (BNNs) have been considered as a promising solution that can significantly reduce the memory and computational requirements of DNNs while still offering similar capabilities of full precision DNN models. Currently, existing GPU-accelerated implementations of BNNs are only tailored for desktop platforms. Due to architecture differences, mere porting of such implementations to mobile devices yields suboptimal performance or is impossible in some cases. Therefore, there has still been a missing piece in the literature for GPU-accelerated implementations of BNNs on mobile devices. In this paper, we propose PhoneBit, a GPU-accelerated BNN inference engine for mobile devices that fully exploits the computing power of BNNs on mobile GPUs. PhoneBit provides a set of operator-level optimizations including locality-friendly data layout, bit packing with vectorization and layers integration for efficient binary convolution. We also provide a detailed implementation and parallelization optimization for PhoneBit to optimally utilize the memory bandwidth and computing power of mobile GPUs. Our experiment results show that PhoneBit can achieve significant speedup and energy efficiency compared with state-of-the-art frameworks for mobile devices. The PhoneBit open source library is available for download at https://code.ihub.org.cn/projects/915/repository/PhoneBit.

引用

页数：10

共 50 条

[41] Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Shafi, Aamir
Hashmi, Jahanzeb Maqbool
Subramoni, Hari
Panda, Dhabaleswar K.
[J]. 21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 277 - 286
[42] GPU-Accelerated and Efficient Multi-View Triangulation for Scene Reconstruction
Mak, Jason
Hess-Flores, Mauricio
Recker, Shawn
Owens, John D.
Joy, Kenneth I.
[J]. 2014 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2014, : 61 - 68
[43] GPU-accelerated phase-field simulation of dendritic solidification in a binary alloy
Yamanaka, Akinori
Aoki, Takayuki
Ogawa, Satoi
Takaki, Tomohiro
[J]. JOURNAL OF CRYSTAL GROWTH, 2011, 318 (01) : 40 - 45
[44] A GPU-Accelerated Monte Carlo Dose Computation Engine for Precision Small Animal Radiotherapy
Liu, Z.
Yang, Y.
[J]. MEDICAL PHYSICS, 2022, 49 (06) : E494 - E494
[45] A GPU-Accelerated Monte Carlo Engine for Calculation of MLC-Collimated Electron Fields
Brost, E.
Tseung, H. Wan Chan
Antolak, J.
[J]. MEDICAL PHYSICS, 2021, 48 (06)
[46] An Efficient Approach of GPU-accelerated Stochastic Gradient Descent Method for Matrix Factorization
Li, Feng
Ye, Yunming
Li, Xutao
[J]. JOURNAL OF INTERNET TECHNOLOGY, 2019, 20 (04): : 1087 - 1097
[47] Efficient GPU-accelerated thermomechanical solver for residual stress prediction in additive manufacturing
Liao, Shuheng
Golgoon, Ashkan
Mozaffar, Mojtaba
Cao, Jian
[J]. COMPUTATIONAL MECHANICS, 2023, 71 (05) : 879 - 893
[48] Intrusion Detection Systems with GPU-Accelerated Deep Neural Networks and Effect of the Depth
Reis, Buminhan
Kaya, Sami Berk
Karatas, Gozde
Sahingoz, Ozgur Koray
[J]. 2018 6TH INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING & INFORMATION TECHNOLOGY (CEIT), 2018,
[49] GPU-Accelerated Multivariate Empirical Mode Decomposition for Massive Neural Data Processing
Mujahid, Taha
Rahman, Anis Ur
Khan, Muhammad Murtaza
[J]. IEEE ACCESS, 2017, 5 : 8691 - 8701
[50] An efficient fine-grained parallel genetic algorithm based on GPU-accelerated
Li, Jian-Ming
Wang, Xiao-Jing
He, Rong-Sheng
Chi, Zhong-Xian
[J]. 2007 IFIP INTERNATIONAL CONFERENCE ON NETWORK AND PARALLEL COMPUTING WORKSHOPS, PROCEEDINGS, 2007, : 855 - +

← 1 2 3 4 5 →