CoExE: An Efficient Co-execution Architecture for Real-Time Neural Network Services

被引:2
|
作者
Liu, Chubo [1 ]
Li, Kenli [1 ]
Song, Mingcong [2 ]
Zhao, Jiechen [2 ]
Li, Keqin [3 ]
Li, Tao [2 ]
Zeng, Zihao [1 ]
机构
[1] Hunan Univ, Coll Informat Sci & Engn, Changsha, Peoples R China
[2] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
[3] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY USA
关键词
Al-cloud acceleration; Co-execution architecture; Real-time NN service; Sparsity-driven multi-context;
D O I
10.1109/dac18072.2020.9218740
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
End-to-end latency is sensitive for user-interactive neural network (NN) services on clouds. For periods of high request load, co-locating multiple NN requests has the potential to reduce end-to-end latency. However, current batch-based accelerators lack request-level parallelism support, leaving the queuing time non-optimized. Meanwhile, naively partitioning resources for simultaneous requests suffers from longer execution time as well as lower resource efficiency because different applications utilize separate resources without sharing. To effectively reduce the end-to-end latency for real-time NN requests, we propose CoExE architecture, equipped with a pipeline implementation of a sparsity-driven real-time co-execution model. By leveraging the non-trivial amount of sparse operations during concurrent NNs execution, the end-to-end latency is decreased by up to 12.3x and 2.4x over Eyeriss-like and SCNN at peak workload mode. Besides, we propose row cross (RC) dataflow to reduce data movement cost, and avoid memory duplication.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Feature Selection and Neural Network Architecture Evaluation for Real-Time Video Object Classification
    Curtis, Phillip
    Harb, Moufid
    Abielmona, Rami
    Petriu, Emil
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 1038 - 1045
  • [32] Real-Time Federated Evolutionary Neural Architecture Search
    Zhu, Hangyu
    Jin, Yaochu
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2022, 26 (02) : 364 - 378
  • [33] A Reconfigurable Architecture for Real-Time Prediction of Neural Activity
    Li, Will X. Y.
    Cheung, Ray C. C.
    Chan, Rosa H. M.
    Song, Dong
    Berger, Theodore W.
    2013 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2013, : 1869 - 1872
  • [34] An analytical approach to the efficient real-time events/services handling in converged network environment
    Kryvinska, Natalia
    Zinterhof, Peter
    van Thanh, Do
    NETWORK-BASED INFORMATION SYSTEMS, PROCEEDINGS, 2007, 4658 : 308 - +
  • [35] Real-time implementation of the cerebellum neural network
    Hao, Xinyu
    Wang, Jiang
    Yang, Shuangming
    Deng, Bin
    Wei, Xile
    Yi, Guosheng
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 3595 - 3599
  • [36] REAL-TIME CONTROL USING A NEURAL NETWORK
    WOOD, D
    BT TECHNOLOGY JOURNAL, 1992, 10 (03): : 69 - 76
  • [37] A New Efficient Sorting Architecture for Real-Time Systems
    Kohutka, Lukas
    Stopjakova, Viera
    2017 6TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2017, : 34 - 37
  • [38] Efficient Method and Architecture for Real-Time Video Defogging
    Kumar, Rahul
    Balasubramanian, Raman
    Kaushik, Brajesh Kumar
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (10) : 6536 - 6546
  • [39] EFFICIENT ARCHITECTURE SEARCH FOR REAL-TIME INSTANCE SEGMENTATION
    Xia, Renqiu
    Zhang, Dongyuan
    Dong, Yixin
    Zhao, Juanping
    Liao, Wenlong
    He, Tao
    Yan, Junchi
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3310 - 3314
  • [40] An efficient real-time architecture for collecting IoT data
    Loria, Mark Phillip
    Toja, Marco
    Carchiolo, Vincenza
    Malgeri, Michele
    PROCEEDINGS OF THE 2017 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2017, : 1157 - 1166