SMART Frame Selection for Action Recognition

被引:0
|
作者
Gowda, Shreyank N. [1 ]
Rohrbach, Marcus [2 ]
Sevilla-Lara, Laura [1 ]
机构
[1] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[2] Facebook AI Res, Menlo Pk, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition is computationally expensive. In this paper, we address the problem of frame selection to improve the accuracy of action recognition. In particular, we show that selecting good frames helps in action recognition performance even in the trimmed videos domain. Recent work has successfully leveraged frame selection for long, untrimmed videos, where much of the content is not relevant, and easy to discard. In this work, however, we focus on the more standard short, trimmed action recognition problem. We argue that good frame selection can not only reduce the computational cost of action recognition but also increase the accuracy by getting rid of frames that are hard to classify. In contrast to previous work, we propose a method that instead of selecting frames by considering one at a time, considers them jointly. This results in a more efficient selection, where "good" frames are more effectively distributed over the video, like snapshots that tell a story. We call the proposed frame selection SMART and we test it in combination with different backbone architectures and on multiple benchmarks (Kinetics, Something-something, UCF101). We show that the SMART frame selection consistently improves the accuracy compared to other frame selection strategies while reducing the computational cost by a factor of 4 to 10 times. We also show that when the primary goal is recognition performance, our selection strategy can improve over recent state-of-the-art models and frame selection strategies on various benchmarks ( UCF101, HMDB51, FCVID, and ActivityNet).
引用
收藏
页码:1451 / 1459
页数:9
相关论文
共 50 条
  • [41] Investigating the impact of frame rate towards robust human action recognition
    Harjanto, Fredro
    Wang, Zhiyong
    Lu, Shiyang
    Tsoi, Ah Chung
    Feng, David Dagan
    SIGNAL PROCESSING, 2016, 124 : 220 - 232
  • [42] Hand Gesture Recognition Using Object Based Key Frame Selection
    Rokade, Ulka S.
    Doye, Dharmpal
    Kokare, Manesh
    ICDIP 2009: INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING, PROCEEDINGS, 2009, : 288 - +
  • [43] Exploiting inter-frame regional correlation for efficient action recognition
    Xu, Yuecong
    Yang, Jianfei
    Mao, Kezhi
    Yin, Jianxiong
    See, Simon
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 178
  • [44] GFNET: A LIGHTWEIGHT GROUP FRAME NETWORK FOR EFFICIENT HUMAN ACTION RECOGNITION
    Liu, Hong
    Zhang, Linlin
    Guan, Lisi
    Liu, Mengyuan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2583 - 2587
  • [45] FEASE: Feature Selection and Enhancement Networks for Action Recognition
    Lu Zhou
    Yuanyao Lu
    Haiyang Jiang
    Neural Processing Letters, 56
  • [46] The Model of Frame Selection. A General Theory of Action for the Social Sciences?
    Esser, Hartmut
    KOLNER ZEITSCHRIFT FUR SOZIOLOGIE UND SOZIALPSYCHOLOGIE, 2010, : 45 - +
  • [47] Selection of Characteristic Frames in Video for Efficient Action Recognition
    Lu, Guoliang
    Kudo, Mineichi
    Toyama, Jun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (10) : 2514 - 2521
  • [48] FEASE: Feature Selection and Enhancement Networks for Action Recognition
    Zhou, Lu
    Lu, Yuanyao
    Jiang, Haiyang
    NEURAL PROCESSING LETTERS, 2024, 56 (02)
  • [49] Probabilistic selection of frames for early action recognition in videos
    Saremi, Mehrin
    Yaghmaee, Farzin
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2019, 8 (04) : 253 - 257
  • [50] Dominant Codewords Selection with Topic Model for Action Recognition
    Kataoka, Hirokatsu
    Iwata, Kenji
    Satoh, Yutaka
    Hayashi, Masaki
    Aoki, Yoshimitsu
    Ilic, Slobodan
    PROCEEDINGS OF 29TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, (CVPRW 2016), 2016, : 770 - 777