DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

被引：67

作者：

Niu, Wei ^{[1
]}

Guan, Jiexiong ^{[1
]}

Wang, Yanzhi ^{[2
]}

Agrawal, Gagan ^{[3
]}

Ren, Bin ^{[1
]}

机构：

[1] William & Mary, Williamsburg, VA 23185 USA

[2] Northeastern Univ, Boston, MA 02115 USA

[3] Augusta Univ, Augusta, GA USA

来源：

PROCEEDINGS OF THE 42ND ACM SIGPLAN INTERNATIONAL CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '21) | 2021年

基金：

美国国家科学基金会;

关键词：

Compiler Optimization; Operator Fusion; Deep Neural Network; Mobile Devices; TRANSFORMATIONS; OPTIMIZATION; LOCALITY; LOOP;

D O I：

10.1145/3453483.3454083

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN, that aim to improve the efficiency of the DNN inference. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections, especially those seen in many extremely deep models. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8x higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3x speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.

引用

页码：883 / 898

页数：16

共 50 条

[31] Centered Weight Normalization in Accelerating Training of Deep Neural Networks
Huang, Lei
Liu, Xianglong
Liu, Yang
Lang, Bo
Tao, Dacheng
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2822 - 2830
[32] Pyramidal Neuron Architectures for Accelerating Deep Neural Networks on FPGA
Ahmed, Hossam O.
Ghoneima, Maged
Dessouky, Mohamed
2018 NASA/ESA CONFERENCE ON ADAPTIVE HARDWARE AND SYSTEMS (AHS 2018), 2018, : 104 - 111
[33] Accelerating Dynamic Aperture Evaluation Using Deep Neural Networks
Di Croce, D.
Giovannozzi, M.
Pieloni, T.
Seidel, M.
Van der Veken, F. F.
IPAC23 PROCEEDINGS, 2024, 2687
[34] Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey
Dhilleswararao, Pudi
Boppu, Srinivas
Manikandan, M. Sabarimalai
Cenkeramaddi, Linga Reddy
IEEE ACCESS, 2022, 10 : 131788 - 131828
[35] Generalized Batch Normalization: Towards Accelerating Deep Neural Networks
Yuan, Xiaoyong
Feng, Zheng
Norton, Matthew
Li, Xiaolin
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 1682 - 1689
[36] Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks
He, Yang
Kang, Guoliang
Dong, Xuanyi
Fu, Yanwei
Yang, Yi
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2234 - 2240
[37] Characterizing the Execution of Deep Neural Networks on Collaborative Robots and Edge Devices
Merck, Matthew L.
Wang, Bingyao
Liu, Lixing
Jia, Chunjun
Siqueira, Arthur
Huang, Qiusen
Saraha, Abhijeet
Lim, Dongsuk
Cao, Jiashen
Hadidi, Ramyad
Kim, Hyesoon
PEARC '19: PROCEEDINGS OF THE PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING ON RISE OF THE MACHINES (LEARNING), 2019,
[38] Adaptive Parallel Execution of Deep Neural Networks on Heterogeneous Edge Devices
Zhou, Li
Samavatian, Mohammad Hossein
Bacha, Anys
Majumdar, Saikat
Teodorescu, Radu
SEC'19: PROCEEDINGS OF THE 4TH ACM/IEEE SYMPOSIUM ON EDGE COMPUTING, 2019, : 195 - 208
[39] Towards the Efficient Multi-Platform Execution of Deep Neural Networks
Hernandez, Hector Gerardo Munoz
2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 277 - 278
[40] Energy-dissipative evolutionary deep operator neural networks
Zhang, Jiahao
Zhang, Shiheng
Shen, Jie
Lin, Guang
JOURNAL OF COMPUTATIONAL PHYSICS, 2024, 498

← 1 2 3 4 5 →