DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

被引:67
|
作者
Niu, Wei [1 ]
Guan, Jiexiong [1 ]
Wang, Yanzhi [2 ]
Agrawal, Gagan [3 ]
Ren, Bin [1 ]
机构
[1] William & Mary, Williamsburg, VA 23185 USA
[2] Northeastern Univ, Boston, MA 02115 USA
[3] Augusta Univ, Augusta, GA USA
来源
PROCEEDINGS OF THE 42ND ACM SIGPLAN INTERNATIONAL CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '21) | 2021年
基金
美国国家科学基金会;
关键词
Compiler Optimization; Operator Fusion; Deep Neural Network; Mobile Devices; TRANSFORMATIONS; OPTIMIZATION; LOCALITY; LOOP;
D O I
10.1145/3453483.3454083
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN, that aim to improve the efficiency of the DNN inference. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections, especially those seen in many extremely deep models. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8x higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3x speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.
引用
收藏
页码:883 / 898
页数:16
相关论文
共 50 条
  • [1] Optimus: An Operator Fusion Framework for Deep Neural Networks
    Cai, Xuyi
    Wang, Ying
    Zhang, Lei
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2023, 22 (01)
  • [2] DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices
    Li, Dawei
    Wang, Xiaolong
    Kong, Deguang
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2322 - 2330
  • [3] Prediction based Execution on Deep Neural Networks
    Song, Mingcong
    Zhao, Jiechen
    Hu, Yang
    Zhang, Jiaqi
    Li, Tao
    2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, : 752 - 763
  • [4] Operator compression with deep neural networks
    Fabian Kröpfl
    Roland Maier
    Daniel Peterseim
    Advances in Continuous and Discrete Models, 2022
  • [5] Operator compression with deep neural networks
    Kroepfl, Fabian
    Maier, Roland
    Peterseim, Daniel
    ADVANCES IN CONTINUOUS AND DISCRETE MODELS, 2022, 2022 (01):
  • [6] Accelerating Deep Neural Networks implementation: A survey
    Dhouibi, Meriam
    Ben Salem, Ahmed Karim
    Saidi, Afef
    Ben Saoud, Slim
    IET COMPUTERS AND DIGITAL TECHNIQUES, 2021, 15 (02): : 79 - 96
  • [7] Accelerating Sparse Deep Neural Networks on FPGAs
    Huang, Sitao
    Pearson, Carl
    Nagi, Rakesh
    Xiong, Jinjun
    Chen, Deming
    Hwu, Wen-mei
    2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
  • [8] Accelerating Deep Neural Networks Using FPGA
    Adel, Esraa
    Magdy, Rana
    Mohamed, Sara
    Mamdouh, Mona
    El Mandouh, Eman
    Mostafa, Hassan
    2018 30TH INTERNATIONAL CONFERENCE ON MICROELECTRONICS (ICM), 2018, : 176 - 179
  • [9] Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks
    Sharma, Hardik
    Park, Jongse
    Suda, Naveen
    Lai, Liangzhen
    Chau, Benson
    Chandra, Vikas
    Esmaeilzadeh, Hadi
    2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, : 764 - 775
  • [10] Fusion of Deep Convolutional Neural Networks
    Suchy, Robert
    Ezekiel, Soundararajan
    Cornacchia, Maria
    2017 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR), 2017,