Multilevel Neural Network for Reducing Expected Inference Time

被引:10
|
作者
Putra, Tryan Aditya [1 ]
Leu, Jenq-Shiou [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Elect & Comp Engn, Taipei 10607, Taiwan
关键词
Edge computing; mobile computing; network compression and acceleration;
D O I
10.1109/ACCESS.2019.2952577
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is widely known that deep neural networks (DNNs) can perform well in many applications, and can sometimes exceed human ability. However, their cost limits their impact in a variety of real-world applications, such as IoT and mobile computing. Recently, many DNN compression and acceleration methods have been employed to overcome this problem. Most methods succeed in reducing the number of parameters and FLOPs, but only a few can speed up expected inference times because of either the overhead generated from using such methods or DNN framework deficiencies. Edge-cloud computing has recently emerged and presents an opportunity for new model acceleration and compression techniques. To address the aforementioned problem, we propose a novel technique to speed up expected inference times by using several networks that perform the exact same task with different strengths. Although our method is based on edge-cloud computing, it is suitable for any other hierarchical computing paradigm. Using a simple yet strong enough estimator, the system predicts whether the data should be passed to a larger network or not. Extensive experimental results demonstrate that the proposed technique can speed up expected inference times and beat almost all state-of-the-art compression techniques, including pruning, low-rank approximation, knowledge distillation, and branchy-type networks, on both CPUs and GPUs.
引用
收藏
页码:174129 / 174138
页数:10
相关论文
共 50 条
  • [1] Predicting the Execution Time of Secure Neural Network Inference
    Zhang, Eloise
    Mann, Zoltan Adam
    ICT SYSTEMS SECURITY AND PRIVACY PROTECTION, SEC 2024, 2024, 710 : 481 - 494
  • [2] Real-time inference in a VLSI spiking neural network
    Corneil, Dane
    Sonnleithner, Daniel
    Neftci, Emre
    Chicca, Elisabetta
    Cook, Matthew
    Indiveri, Giacomo
    Douglas, Rodney
    2012 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 2012), 2012, : 2425 - 2428
  • [3] Lightweight Inference by Neural Network Pruning: Accuracy, Time and Comparison
    Paralikas, Ilias
    Spantideas, Sotiris
    Giannopoulos, Anastasios
    Trakadas, Panagiotis
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, PT III, AIAI 2024, 2024, 713 : 248 - 257
  • [4] SNICIT: Accelerating Sparse Neural Network Inference via Compression at Inference Time on GPU
    Jiang, Shui
    Huang, Tsung-Wei
    Yu, Bei
    Ho, Tsung-Yi
    PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023, 2023, : 51 - 61
  • [5] B-LNN: Inference-time linear model for secure neural network inference
    Wang, Qizheng
    Ma, Wenping
    Wang, Weiwei
    INFORMATION SCIENCES, 2023, 638
  • [6] Accelerating Toeplitz Neural Network with Constant-time Inference Complexity
    Qin, Zhen
    Zhone, Yiran
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12206 - 12215
  • [7] Fuzzy inference neural network
    Nishina, T
    Hagiwara, M
    NEUROCOMPUTING, 1997, 14 (03) : 223 - 239
  • [8] Dependent Multilevel Interaction Network for Natural Language Inference
    Li, Yun
    Yang, Yan
    Deng, Yong
    Hu, Qinmin Vivian
    Chen, Chengcai
    He, Liang
    Yu, Zhou
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: TEXT AND TIME SERIES, PT IV, 2019, 11730 : 9 - 21
  • [9] Reducing Response Time by Tilt Sensor using Neural Network Comparator
    Shimoo, Kosei
    Nanbu, Yukihisa
    Teramura, Masahiro
    IEEJ Transactions on Sensors and Micromachines, 2019, 139 (09) : 310 - 316
  • [10] ANIRA: An Architecture for Neural Network Inference in Real-Time Audio Applications
    Ackva, Valentin
    Schulz, Fares
    2024 IEEE 5TH INTERNATIONAL SYMPOSIUM ON THE INTERNET OF SOUNDS, IS2 2024, 2024, : 193 - 202