An efficient GPU-accelerated inference engine for binary neural network on mobile phones

被引：4

作者：

He, Shengyu ^{[1
]}

Meng, Haitao ^{[1
,3
]}

Zhou, Zhaoheng ^{[1
]}

Liu, Yongjun ^{[2
]}

Huang, Kai ^{[1
]}

Chen, Gang ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Guangzhou, Guangdong, Peoples R China

[2] Changshu Inst Technol, Suzhou, Peoples R China

[3] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

JOURNAL OF SYSTEMS ARCHITECTURE | 2021年 / 117卷

基金：

中国国家自然科学基金;

关键词：

STEREO ESTIMATION;

D O I：

10.1016/j.sysarc.2021.102156

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Over the last years, deep neural networks (DNNs) are becoming more powerful and have risen in popularity, especially in mobile computing. Applications running on edge AI devices such as smartphones would potentially benefit from the new opportunities enabled by deep learning techniques. However, DNNs are by nature computationally and memory intensive, making them challenging to deploy on mobile devices. Binary neural networks (BNNs) have been considered as a promising solution that can significantly reduce the memory and computational requirements of DNNs while still offering similar capabilities of full precision DNN models. Currently, existing GPU-accelerated implementations of BNNs are only tailored for desktop platforms. Due to architecture differences, mere porting of such implementations to mobile devices yields suboptimal performance or is impossible in some cases. Therefore, there has still been a missing piece in the literature for GPU-accelerated implementations of BNNs on mobile devices. In this paper, we propose PhoneBit, a GPU-accelerated BNN inference engine for mobile devices that fully exploits the computing power of BNNs on mobile GPUs. PhoneBit provides a set of operator-level optimizations including locality-friendly data layout, bit packing with vectorization and layers integration for efficient binary convolution. We also provide a detailed implementation and parallelization optimization for PhoneBit to optimally utilize the memory bandwidth and computing power of mobile GPUs. Our experiment results show that PhoneBit can achieve significant speedup and energy efficiency compared with state-of-the-art frameworks for mobile devices. The PhoneBit open source library is available for download at https://code.ihub.org.cn/projects/915/repository/PhoneBit.

引用

页数：10

共 50 条

[1] PhoneBit: Efficient GPU-Accelerated Binary Neural Network Inference Engine for Mobile Phones
Chen, Gang
Hen, Shengyu
Meng, Haitao
Huang, Kai
[J]. PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 786 - 791
[2] GPU-Accelerated Real-Time Stereo Estimation With Binary Neural Network
Chen, Gang
Meng, Haitao
Liang, Yucheng
Huang, Kai
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (12) : 2896 - 2907
[3] A GPU-accelerated real-time human voice separation framework for mobile phones
Chen, Gang
Zheng, Yi
Zhou, Zhaoheng
He, Shengyu
Yi, Wang
[J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 145
[4] GPU-accelerated artificial neural network potential for molecular dynamics simulation
Zhang, Meng
Hibi, Koki
Inoue, Junya
[J]. COMPUTER PHYSICS COMMUNICATIONS, 2023, 285
[5] Efficient Intranode Communication in GPU-Accelerated Systems
Ji, Feng
Aji, Ashwin M.
Dinan, James
Buntinas, Darius
Balaji, Pavan
Feng, Wu-chun
Ma, Xiaosong
[J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 1838 - 1847
[6] A GPU-accelerated cortical neural network model for visually guided robot navigation
Beyeler, Michael
Oros, Nicolas
Dutt, Nikil
Krichmar, Jeffrey L.
[J]. NEURAL NETWORKS, 2015, 72 : 75 - 87
[7] GPU-accelerated differential dependency network analysis
Speyer, Gil
Rodriguez, Juan J.
Bencomo, Tomas
Kim, Seungchan
[J]. 2018 26TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2018), 2018, : 410 - 414
[8] GPU-Accelerated Neural Network Potential Energy Surfaces for Diffusion Monte Carlo
DiRisio, Ryan J.
Lu, Fenris
McCoy, Anne B.
[J]. JOURNAL OF PHYSICAL CHEMISTRY A, 2021, 125 (26): : 5849 - 5859
[9] Dynamic parallelism for synaptic updating in GPU-accelerated spiking neural network simulations
Kasap, Bahadir
van Opstal, A. John
[J]. NEUROCOMPUTING, 2018, 302 : 55 - 65
[10] Fast and Low-Precision Learning in GPU-Accelerated Spiking Neural Network
She, Xueyuan
Long, Yun
Mukhopadhyay, Saibal
[J]. 2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 450 - 455

← 1 2 3 4 5 →