CoExE: An Efficient Co-execution Architecture for Real-Time Neural Network Services

被引:2
|
作者
Liu, Chubo [1 ]
Li, Kenli [1 ]
Song, Mingcong [2 ]
Zhao, Jiechen [2 ]
Li, Keqin [3 ]
Li, Tao [2 ]
Zeng, Zihao [1 ]
机构
[1] Hunan Univ, Coll Informat Sci & Engn, Changsha, Peoples R China
[2] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
[3] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY USA
关键词
Al-cloud acceleration; Co-execution architecture; Real-time NN service; Sparsity-driven multi-context;
D O I
10.1109/dac18072.2020.9218740
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
End-to-end latency is sensitive for user-interactive neural network (NN) services on clouds. For periods of high request load, co-locating multiple NN requests has the potential to reduce end-to-end latency. However, current batch-based accelerators lack request-level parallelism support, leaving the queuing time non-optimized. Meanwhile, naively partitioning resources for simultaneous requests suffers from longer execution time as well as lower resource efficiency because different applications utilize separate resources without sharing. To effectively reduce the end-to-end latency for real-time NN requests, we propose CoExE architecture, equipped with a pipeline implementation of a sparsity-driven real-time co-execution model. By leveraging the non-trivial amount of sparse operations during concurrent NNs execution, the end-to-end latency is decreased by up to 12.3x and 2.4x over Eyeriss-like and SCNN at peak workload mode. Besides, we propose row cross (RC) dataflow to reduce data movement cost, and avoid memory duplication.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Efficient and robust header compression for real-time services
    Le, K
    Clanton, C
    Liu, ZG
    Zheng, HH
    WCNC: 2000 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, VOLS 1-3, 2000, : 924 - 928
  • [42] Resource-Efficient Execution of Conditional Parallel Real-Time Tasks
    Baruah, Sanjoy
    EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 218 - 231
  • [43] Hierarchical interconnection network architecture for real-time systems
    Orencik, Bulent
    Turkish Journal of Electrical Engineering and Computer Sciences, 1998, 6 (02): : 131 - 166
  • [44] Lightweight Network Architecture for Real-Time Action Recognition
    Kozlov, Alexander
    Andronov, Vadim
    Gritsenko, Yana
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 2074 - 2080
  • [45] Separating essentials from incidentals: An execution architecture for real-time control systems
    Dvorak, DL
    Reinholtz, WK
    SEVENTH IEEE INTERNATIONAL SYMPOSIUM ON OBJECT-ORIENTED REAL-TIME DISTRIBUTED COMPUTING, PROCEEDINGS, 2004, : 301 - 304
  • [46] An efficient fire detection system based on deep neural network for real-time applications
    Gupta, Hitesh
    Nihalani, Neelu
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 6251 - 6264
  • [47] A real-time video multicast architecture for assured forwarding services
    Matrawy, A
    Lambadaris, I
    IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (04) : 688 - 699
  • [48] Distracted Driver Recognizer with Simple and Efficient Convolutional Neural Network for Real-time System
    Nguyen, Duy-Linh
    Putro, Muhamad Dwisnanto
    Jo, Kang-Hyun
    2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021, : 371 - 375
  • [49] Efficient Substructure Preserving MOR Using Real-Time Temporal Supervised Neural Network
    Alsmadi, Othman M. K.
    Abo-Hammour, Zaer. S.
    Al-Smadi, Adnan M.
    NETWORKED DIGITAL TECHNOLOGIES, PT 2, 2010, 88 : 193 - +
  • [50] An efficient neural-network model for real-time fault detection in industrial machine
    Verma, Amar Kumar
    Nagpal, Shivika
    Desai, Aditya
    Sudha, Radhika
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (04): : 1297 - 1310