End-to-End Optimization of Deep Learning Applications

被引：27

作者：

Sohrabizadeh, Atefeh ^{[1
]}

Wang, Jie ^{[1
]}

Cong, Jason ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA

来源：

2020 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA '20) | 2020年

基金：

英国工程与自然科学研究理事会;

关键词：

FPGA; CNN; OpenPose; TensorFlow; tiling; integration;

D O I：

10.1145/3373087.3375321

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The irregularity of recent Convolutional Neural Network (CNN) models such as less data reuse and parallelism due to the extensive network pruning and simplification creates new challenges for FPGA acceleration. Furthermore, without proper optimization, there could be significant overheads when integrating FPGAs into existing machine learning frameworks like TensorFlow. Such a problem is mostly overlooked by previous studies. However, our study shows that a naive FPGA integration into TensorFlow could lead to up to 8.45x performance degradation. To address the challenges mentioned above, we propose several SW/HW co-design approaches to perform the end-to-end optimization of deep learning applications. We present a flexible and composable architecture called FlexCNN. It can deliver high computation efficiency for different types of convolution layers using techniques including dynamic tiling and data layout optimization. FlexCNN is further integrated into the TensorFlow framework with a fully-pipelined software-hardware integration flow. This alleviates the high overheads of TensorFlow-FPGA handshake and other non-CNN processing stages. We use OpenPose, a popular CNN-based application for human pose recognition, as a case study. Experimental results show that with the FlexCNN architecture optimizations, we can achieve 2.3x performance improvement. The pipelined integration stack leads to a further 5x speedup. Overall, the SW/HW co-optimization produces a speedup of 11.5x and results in an end-to-end performance of 23.8FPS for OpenPose with floating-point precision, which is the highest performance reported for this application on FPGA in the literature.

引用

页码：133 / 139

页数：7

共 50 条

[1] End-to-end Deep Learning of Optimization Heuristics
Cummins, Chris
Petoumenos, Pavlos
Wang, Zheng
Leather, Hugh
[J]. 2017 26TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2017, : 219 - 232
[2] Optimization of Neuroprosthetic Vision via End-to-End Deep Reinforcement Learning
Kucukoglu, Burcu
Rueckauer, Bodo
Ahmad, Nasir
van Steveninck, Jaap de Ruyter
Guclu, Umut
van Gerven, Marcel
[J]. INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2022, 32 (11)
[3] Exploring End-to-end Deep Learning Applications for Event Classification at CMS
Andrews, Michael
Paulini, Manfred
Gleyzer, Sergei
Poczos, Barnabas
[J]. 23RD INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2018), 2019, 214
[4] End-to-End SAR Deep Learning Imaging Method Based on Sparse Optimization
Zhao, Siyuan
Ni, Jiacheng
Liang, Jia
Xiong, Shichao
Luo, Ying
[J]. REMOTE SENSING, 2021, 13 (21)
[5] End-to-End Deep Learning for Robotic Following
Pierre, John M.
[J]. ICMSCE 2018: PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON MECHATRONICS SYSTEMS AND CONTROL ENGINEERING, 2015, : 77 - 85
[6] End-to-end deep learning with neuromorphic photonics
Dabos, G.
Mourgias-Alexandris, G.
Totovic, A.
Kirtas, M.
Passalis, N.
Tefas, A.
Pleros, N.
[J]. INTEGRATED OPTICS: DEVICES, MATERIALS, AND TECHNOLOGIES XXV, 2021, 11689
[7] Spline Filters For End-to-End Deep Learning
Balestriero, Randall
Cosentino, Romain
Glotin, Herve
Baraniuk, Richard
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[8] Deeplite Neutrino™: An End-to-End Framework for Constrained Deep Learning Model Optimization
Sankaran, Anush
Mastropietro, Olivier
Saboori, Ehsan
Idris, Yasser
Sawyer, Davis
AskariHemmat, MohammadHossein
Hacene, Ghouthi Boukli
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15166 - 15174
[9] A Theoretical Framework for End-to-End Learning of Deep Neural Networks With Applications to Robotics
Li, Sitan
Nguyen, Huu-Thiet
Cheah, Chien Chern
[J]. IEEE ACCESS, 2023, 11 : 21992 - 22006
[10] DeepKG: an end-to-end deep learning-based workflow for biomedical knowledge graph extraction, optimization and applications
Li, Zongren
Zhong, Qin
Yang, Jing
Duan, Yongjie
Wang, Wenjun
Wu, Chengkun
He, Kunlun
[J]. BIOINFORMATICS, 2022, 38 (05) : 1477 - 1479

← 1 2 3 4 5 →