Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos

被引:9
|
作者
Zhang, Zhaoyang [1 ,2 ]
Kuang, Zhanghui [2 ]
Luo, Ping [3 ]
Feng, Litong [2 ]
Zhang, Wei [2 ]
机构
[1] Wuhan Univ, Wuhan, Peoples R China
[2] SenseTime Res, Beijing, Peoples R China
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Video Action Recognition; Temporal Sequence Distillation;
D O I
10.1145/3240508.3240534
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Video Analytics Software as a Service (VA SaaS) has been rapidly growing in recent years. VA SaaS is typically accessed by users using a lightweight client. Because the transmission bandwidth between the client and cloud is usually limited and expensive, it brings great benefits to design cloud video analysis algorithms with a limited data transmission requirement. Although considerable research has been devoted to video analysis, to our best knowledge, little of them has paid attention to the transmission bandwidth limitation in SaaS. As the first attempt in this direction, this work introduces a problem of few-frame action recognition, which aims at maintaining high recognition accuracy, when accessing only a few frames during both training and test. Unlike previous work that processed dense frames, we present Temporal Sequence Distillation (TSD), which distills a long video sequence into a very short one for transmission. By end-to-end training with 3D CNNs for video action recognition, TSD learns a compact and discriminative temporal and spatial representation of video frames. On Kinetics dataset, TSD+I3D typically requires only 50% of the number of frames compared to I3D [1], a state-of-the-art video action recognition algorithm, to achieve almost the same accuracies. The proposed TSD has three appealing advantages. Firstly, TSD has a lightweight architecture, and can be deployed in the client, e.g., mobile devices, to produce compressed representative frames to save transmission bandwidth. Secondly, TSD significantly reduces the computations to run video action recognition with compressed frames on the cloud, while maintaining high recognition accuracies. Thirdly, TSD can be plugged in as a preprocessing module of any existing 3D CNNs. Extensive experiments show the effectiveness and characteristics of TSD.
引用
收藏
页码:257 / 264
页数:8
相关论文
共 50 条
  • [41] Aggregating the temporal coherent descriptors in videos using multiple learning kernel for action recognition
    Saleh, Adel
    Abdel-Nasser, Mohamed
    Angel Garcia, Miguel
    Puig, Domenec
    PATTERN RECOGNITION LETTERS, 2018, 105 : 4 - 12
  • [42] A Spatio-Temporal Deep Learning Approach For Human Action Recognition in Infrared Videos
    Shah, Anuj K.
    Ghosh, Ripul
    Akula, Aparna
    OPTICS AND PHOTONICS FOR INFORMATION PROCESSING XII, 2018, 10751
  • [43] Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos
    Duta, Ionut Cosmin
    Ionescu, Bogdan
    Aizawa, Kiyoharu
    Sebe, Nicu
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3205 - 3214
  • [44] Learning 3D Action Models from a few 2D videos for View Invariant Action Recognition
    Natarajan, Pradeep
    Singh, Vivek Kumar
    Nevatia, Ram
    2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 2006 - 2013
  • [45] Spatio-Temporal Self-supervision for Few-Shot Action Recognition
    Yu, Wanchuan
    Guo, Hanyu
    Yan, Yan
    Li, Jie
    Wang, Hanzi
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 84 - 96
  • [46] MULTI-SCALE TEMPORAL FEATURE FUSION FOR FEW-SHOT ACTION RECOGNITION
    Lee, Jun-Tae
    Yun, Sungrack
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1785 - 1789
  • [47] Temporal-Viewpoint Transportation Plan for Skeletal Few-Shot Action Recognition
    Wang, Lei
    Koniusz, Piotr
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 307 - 326
  • [48] Searching for Better Spatio-temporal Alignment in Few-Shot Action Recognition
    Cao, Yichao
    Su, Xiu
    Tang, Qingfei
    You, Shan
    Lu, Xiaobo
    Xu, Chang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [49] Few-shot action recognition with implicit temporal alignment and pair similarity optimization
    Cao, Congqi
    Li, Yajuan
    Lv, Qinyi
    Wang, Peng
    Zhang, Yanning
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 210
  • [50] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
    Wang, Limin
    Xiong, Yuanjun
    Wang, Zhe
    Qiao, Yu
    Lin, Dahua
    Tang, Xiaoou
    Van Gool, Luc
    COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 20 - 36