Multilevel Neural Network for Reducing Expected Inference Time

被引:10
|
作者
Putra, Tryan Aditya [1 ]
Leu, Jenq-Shiou [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Elect & Comp Engn, Taipei 10607, Taiwan
关键词
Edge computing; mobile computing; network compression and acceleration;
D O I
10.1109/ACCESS.2019.2952577
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is widely known that deep neural networks (DNNs) can perform well in many applications, and can sometimes exceed human ability. However, their cost limits their impact in a variety of real-world applications, such as IoT and mobile computing. Recently, many DNN compression and acceleration methods have been employed to overcome this problem. Most methods succeed in reducing the number of parameters and FLOPs, but only a few can speed up expected inference times because of either the overhead generated from using such methods or DNN framework deficiencies. Edge-cloud computing has recently emerged and presents an opportunity for new model acceleration and compression techniques. To address the aforementioned problem, we propose a novel technique to speed up expected inference times by using several networks that perform the exact same task with different strengths. Although our method is based on edge-cloud computing, it is suitable for any other hierarchical computing paradigm. Using a simple yet strong enough estimator, the system predicts whether the data should be passed to a larger network or not. Extensive experimental results demonstrate that the proposed technique can speed up expected inference times and beat almost all state-of-the-art compression techniques, including pruning, low-rank approximation, knowledge distillation, and branchy-type networks, on both CPUs and GPUs.
引用
收藏
页码:174129 / 174138
页数:10
相关论文
共 50 条
  • [31] On the Accuracy of Analog Neural Network Inference Accelerators
    Xiao, T. Patrick
    Feinberg, Ben
    Bennett, Christopher H.
    Prabhakar, Venkatraman
    Saxena, Prashant
    Agrawal, Vineet
    Agarwal, Sapan
    Marinella, Matthew J.
    IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2022, 22 (04) : 26 - 48
  • [32] A NEW ADAPTIVE FUZZY INFERENCE NEURAL NETWORK
    Qin, Yi
    Pei, Zheng
    INTELLIGENT DECISION MAKING SYSTEMS, VOL. 2, 2010, : 661 - 666
  • [33] Reducing Memory Requirements of Convolutional Neural Networks for Inference at the Edge
    Bravenec, Tomas
    Fryza, Tomas
    2021 31ST INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2021,
  • [34] Neural Network Partitioning for Fast Distributed Inference
    Viramontes, Robert
    Davoodi, Azadeh
    2023 24TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, ISQED, 2023, : 359 - 365
  • [35] FORENSICABILITY OF DEEP NEURAL NETWORK INFERENCE PIPELINES
    Schloegl, Alexander
    Kupek, Tobias
    Boehme, Rainer
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2515 - 2519
  • [36] Causal Network Inference for Neural Ensemble Activity
    Chen, Rong
    NEUROINFORMATICS, 2021, 19 (03) : 515 - 527
  • [37] Deep Neural Network for Supervised Inference of Gene Regulatory Network
    Daoudi, Meroua
    Meshoul, Souham
    MODELLING AND IMPLEMENTATION OF COMPLEX SYSTEMS, 2019, 64 : 149 - 157
  • [38] Inference of Gene Regulatory Network based on Legendre Neural Network
    Yang, Bin
    Liu, Sanrong
    2016 8TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY IN MEDICINE AND EDUCATION (ITME), 2016, : 192 - 194
  • [39] INFERENCE OF A RULE BY A NEURAL NETWORK WITH THERMAL NOISE
    GYORGYI, G
    PHYSICAL REVIEW LETTERS, 1990, 64 (24) : 2957 - 2960
  • [40] A learning probabilistic neural network with fuzzy inference
    Bodyanskiy, Y
    Gorshkov, Y
    Kolodyazhniy, V
    Wermstedt, J
    ARTIFICIAL NEURAL NETS AND GENETIC ALGORITHMS, PROCEEDINGS, 2003, : 13 - 17