SMART Frame Selection for Action Recognition

被引:0
|
作者
Gowda, Shreyank N. [1 ]
Rohrbach, Marcus [2 ]
Sevilla-Lara, Laura [1 ]
机构
[1] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[2] Facebook AI Res, Menlo Pk, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition is computationally expensive. In this paper, we address the problem of frame selection to improve the accuracy of action recognition. In particular, we show that selecting good frames helps in action recognition performance even in the trimmed videos domain. Recent work has successfully leveraged frame selection for long, untrimmed videos, where much of the content is not relevant, and easy to discard. In this work, however, we focus on the more standard short, trimmed action recognition problem. We argue that good frame selection can not only reduce the computational cost of action recognition but also increase the accuracy by getting rid of frames that are hard to classify. In contrast to previous work, we propose a method that instead of selecting frames by considering one at a time, considers them jointly. This results in a more efficient selection, where "good" frames are more effectively distributed over the video, like snapshots that tell a story. We call the proposed frame selection SMART and we test it in combination with different backbone architectures and on multiple benchmarks (Kinetics, Something-something, UCF101). We show that the SMART frame selection consistently improves the accuracy compared to other frame selection strategies while reducing the computational cost by a factor of 4 to 10 times. We also show that when the primary goal is recognition performance, our selection strategy can improve over recent state-of-the-art models and frame selection strategies on various benchmarks ( UCF101, HMDB51, FCVID, and ActivityNet).
引用
收藏
页码:1451 / 1459
页数:9
相关论文
共 50 条
  • [21] FRAME-SKIP CONVOLUTIONAL NEURAL NETWORKS FOR ACTION RECOGNITION
    Liu, Yinan
    Wu, Qingbo
    Tang, Liangzhi
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2017,
  • [22] Action Recognition With Motion Diversification and Dynamic Selection
    Zhuang, Peiqin
    Guo, Yu
    Yu, Zhipeng
    Zhou, Luping
    Bai, Lei
    Liang, Ding
    Wang, Zhiyong
    Wang, Yali
    Ouyang, Wanli
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4884 - 4896
  • [23] Research on lightweight action recognition method based on key frame
    Zhou Y.
    Bai H.
    Li W.
    Guo H.
    Xu X.
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2020, 41 (07): : 196 - 204
  • [24] Submodular Attribute Selection for Action Recognition in Video
    Zheng, Jinging
    Jiang, Zhuolin
    Chellappa, Rama
    Phillips, P. Jonathon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [25] Hybrid embedding for multimodal few-frame action recognition
    Shafizadegan, Fatemeh
    Naghsh-Nilchi, Ahmad Reza
    Shabaninia, Elham
    MULTIMEDIA SYSTEMS, 2025, 31 (02)
  • [26] Discriminative Part Selection for Human Action Recognition
    Zhang, Shiwei
    Gao, Changxin
    Zhang, Jing
    Chen, Feifei
    Sang, Nong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (04) : 769 - 780
  • [27] Dynamic Feature Selection for Online Action Recognition
    Bloom, Victoria
    Argyriou, Vasileios
    Makris, Dimitrios
    HUMAN BEHAVIOR UNDERSTANDING (HBU 2013), 2013, 8212 : 64 - 76
  • [28] EFFICIENT OBJECT FEATURE SELECTION FOR ACTION RECOGNITION
    Zhang, Tianyi
    Zhang, Yu
    Cai, Jianfei
    Kot, Alex C.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2707 - 2711
  • [29] An improved smart key frame extraction algorithm for vehicle target recognition
    Wang, Jianguo
    Zeng, Cheng
    Wang, Zhongsheng
    Jiang, Kun
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 97
  • [30] Action description logic for smart home agent recognition
    Bouzouane, Abdenour
    Bouchard, Bruno
    Giroux, Sylvain
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON HUMAN-COMPUTER INTERACTION, 2005, : 185 - 190