Research on Parallel Acceleration for Deep Learning Inference Based on Many-Core ARM Platform

被引:0
|
作者
Zhu, Keqian [1 ]
Jiang, Jingfei [1 ]
机构
[1] Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China
来源
关键词
Parallel acceleration; Deep learning inference; Many-core ARM;
D O I
10.1007/978-981-13-2423-9_3
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning is one of the hottest research directions in the field of artificial intelligence. It has achieved results which subvert these of traditional methods. However, the demand for computing ability of hardware platform is also increasing. The academia and industry mainly use heterogeneous GPUs to accelerating computation. ARM is relatively more open than GPUs. The purpose of this paper is to study the performance and related acceleration techniques of ThunderX high-performance many-core ARM chips under large-scale inference tasks. In order to study the computational performance of the target platform objectively, several deep models are adapted for acceleration. Through the selection of computational libraries, adjustment of parallel strategies, application of various performance optimization techniques, we have excavated the computing ability of many-core ARM platforms deeply. The final experimental results show that the performance of single-chip ThunderX is equivalent to that of the i7 7700 K chip, and the overall performance of dual-chip can reach 1.77 times that of the latter. In terms of energy efficiency, the former is inferior to the latter. Stronger cooling system or bad power management may lead to more power consumption. Overall, high-performance ARM chips can be deployed in the cloud to complete large-scale deep learning inference tasks which requiring high throughput.
引用
收藏
页码:30 / 41
页数:12
相关论文
共 50 条
  • [21] Graph Reachability on Parallel Many-Core Architectures
    Quer, Stefano
    Calabrese, Andrea
    COMPUTATION, 2020, 8 (04) : 1 - 26
  • [22] The Course of "Parallel Computing" in the Many-core Era
    Wan Han
    Gao Xiaopeng
    Li Yi
    SOCIAL SCIENCE AND EDUCATION, 2013, 10 : 455 - +
  • [23] Multi and many-core computing for parallel metaheuristics
    Melab, Nouredine
    Mezmaz, Mohand
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (09):
  • [24] Parallel Patterns for General Purpose Many-Core
    Buono, Daniele
    Danelutto, Marco
    Lametti, Silvia
    Torquati, Massimo
    PROCEEDINGS OF THE 2013 21ST EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2013, : 131 - 139
  • [25] A Many-Core Accelerator Design for On-Chip Deep Reinforcement Learning
    Wang, Ying
    Wang, Mengdi
    Li, Bing
    Li, Huawei
    Li, Xiaowei
    2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
  • [26] A Semantic Model for Many-Core Parallel Computing
    Zhang, Nan
    Duan, Zhenhua
    COMBINATORIAL OPTIMIZATION AND APPLICATIONS, 2011, 6831 : 464 - 479
  • [27] Parallel neighbourhood search on many-core platforms
    Lam, Yuet Ming
    Tsoi, Kuen Hung
    Luk, Wayne
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2013, 8 (03) : 281 - 293
  • [28] A Parallel Many-core CUDA-based Graph Labeling Computation
    Quer, Stefano
    ICSOFT: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES, 2020, : 597 - 605
  • [29] Research on quantitative inference acceleration technology of Convolutional Neural Network for ARM Platform
    Wang, Xuqiang
    Zhang, Qianyi
    Yang, Yifan
    Zong, Xiangrui
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 208 - 211
  • [30] The Research on The CPU Intelligent Scheduling Based On The Many-core Processors
    Shao Zuozhi
    Zhang Yingqiang
    Mu Hongtao
    Cheng Rui
    PROCEEDINGS OF 2016 IEEE 7TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2016), 2016, : 779 - 782