Multilevel Neural Network for Reducing Expected Inference Time

被引:10
|
作者
Putra, Tryan Aditya [1 ]
Leu, Jenq-Shiou [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Elect & Comp Engn, Taipei 10607, Taiwan
关键词
Edge computing; mobile computing; network compression and acceleration;
D O I
10.1109/ACCESS.2019.2952577
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is widely known that deep neural networks (DNNs) can perform well in many applications, and can sometimes exceed human ability. However, their cost limits their impact in a variety of real-world applications, such as IoT and mobile computing. Recently, many DNN compression and acceleration methods have been employed to overcome this problem. Most methods succeed in reducing the number of parameters and FLOPs, but only a few can speed up expected inference times because of either the overhead generated from using such methods or DNN framework deficiencies. Edge-cloud computing has recently emerged and presents an opportunity for new model acceleration and compression techniques. To address the aforementioned problem, we propose a novel technique to speed up expected inference times by using several networks that perform the exact same task with different strengths. Although our method is based on edge-cloud computing, it is suitable for any other hierarchical computing paradigm. Using a simple yet strong enough estimator, the system predicts whether the data should be passed to a larger network or not. Extensive experimental results demonstrate that the proposed technique can speed up expected inference times and beat almost all state-of-the-art compression techniques, including pruning, low-rank approximation, knowledge distillation, and branchy-type networks, on both CPUs and GPUs.
引用
收藏
页码:174129 / 174138
页数:10
相关论文
共 50 条
  • [21] On-Line Real Time Realization and Application of Adaptive Fuzzy Inference Neural Network
    Han Jianguo & Guo Junchao (Beijing University of Chemical Technology
    Journal of Systems Engineering and Electronics, 2000, (01) : 67 - 74
  • [22] An improved algorithm for reducing Bayesian network inference complexity
    Zhang, Xiaodan
    Zhao, Hai
    Sun, Peigang
    Xu, Ye
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 3264 - +
  • [23] A MULTILEVEL NEURAL-NETWORK FOR A D CONVERSION
    YUH, JD
    NEWCOMB, RW
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 1993, 4 (03): : 470 - 483
  • [24] EVOLUTIONAL DEVELOPMENT OF A MULTILEVEL NEURAL-NETWORK
    ODRI, SV
    PETROVACKI, DP
    KRSTONOSIC, GA
    NEURAL NETWORKS, 1993, 6 (04) : 583 - 595
  • [25] A Heterogeneous Inference Framework for a Deep Neural Network
    Gadea-Girones, Rafael
    Rocabado-Rocha, Jose Luis
    Fe, Jorge
    Monzo, Jose M.
    ELECTRONICS, 2024, 13 (02)
  • [26] LINNA: Likelihood Inference Neural Network Accelerator
    To, Chun-Hao
    Rozo, Eduardo
    Krause, Elisabeth
    Wu, Hao-Yi
    Wechsler, Risa H.
    Salcedo, Andres N.
    JOURNAL OF COSMOLOGY AND ASTROPARTICLE PHYSICS, 2023, (01):
  • [27] Temperature control with a neural fuzzy inference network
    Lin, CT
    Juang, CF
    Li, CP
    SOFT COMPUTING IN INTELLIGENT SYSTEMS AND INFORMATION PROCESSING, 1996, : 91 - 96
  • [28] Causal Network Inference for Neural Ensemble Activity
    Rong Chen
    Neuroinformatics, 2021, 19 : 515 - 527
  • [29] Temperature control with a neural fuzzy inference network
    Lin, CT
    Juang, CF
    Li, CP
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 1999, 29 (03): : 440 - 451
  • [30] Hierarchical neural network with efficient selection inference
    Mi, Jian-Xun
    Li, Nuo
    Huang, Ke-Yang
    Li, Weisheng
    Zhou, Lifang
    NEURAL NETWORKS, 2023, 161 : 535 - 549