STAR: Efficient SpatioTemporal Modeling for Action Recognition

被引:2
|
作者
Kumar, Abhijeet [1 ]
Abrams, Samuel [1 ]
Kumar, Abhishek [1 ]
Narayanan, Vijaykrishnan [1 ]
机构
[1] Penn State Univ, EECS Dept, State Coll, PA 16802 USA
关键词
Action recognition; Compressed domain; I-frames; Spatial-temporal 2D convolutional networks; DOMAIN;
D O I
10.1007/s00034-022-02160-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Action recognition in video has gained significant attention over the past several years. While conventional 2D CNNs have found great success in understanding images, they are not as effective in capturing temporal relationships present in video. By contrast, 3D CNNs capture spatiotemporal information well, but they incur a high computational cost, making deployment challenging. In video, key information is typically confined to a small number of frames, though many current approaches require decompressing and processing all frames, which wastes resources. Others work directly on the compressed domain but require multiple input streams to understand the data. In our work, we directly operate on compressed video and extract information solely from intracoded frames (I-frames) avoiding the use of motion vectors and residuals for motion information making this a single-stream network. This reduces processing time and energy consumption, by extension, making this approach more accessible for a wider range of machines and uses. Extensive testing is employed on the UCF101 (Soomro et al. in UCF101: a dataset of 101 human actions classes from videos in the Wild, 2012) and HMDB51 (Kuehne et al., in: Jhuang, Garrote, Poggio, Serre (eds) Proceedings of the international conference on computer vision (ICCV), 2011) datasets to evaluate our framework and show that computational complexity is reduced significantly while achieving competitive accuracy to existing compressed domain efforts, i.e., 92.6% top1 accuracy in UCF-101 and 62.9% in HMDB-51 dataset with 24.3M parameters and 4 GFLOPS and energy savings of over 11 x for the two datasets versus CoViAR (Wu et al. in Compressed video action recognition, 2018).
引用
收藏
页码:705 / 723
页数:19
相关论文
共 50 条
  • [21] Spatiotemporal Multiplier Networks for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Wildes, Richard P.
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7445 - 7454
  • [22] Spatiotemporal wavelet correlogram for human action recognition
    Hamid Abrishami Moghaddam
    Amin Zare
    International Journal of Multimedia Information Retrieval, 2019, 8 : 167 - 180
  • [23] Nesting spatiotemporal attention networks for action recognition
    Li, Jiapeng
    Wei, Ping
    Zheng, Nanning
    NEUROCOMPUTING, 2021, 459 : 338 - 348
  • [24] STM: SpatioTemporal and Motion Encoding for Action Recognition
    Jiang, Boyuan
    Wang, MengMeng
    Gan, Weihao
    Wu, Wei
    Yan, Junjie
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2000 - 2009
  • [25] Spatiotemporal wavelet correlogram for human action recognition
    Moghaddam, Hamid Abrishami
    Zare, Amin
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2019, 8 (03) : 167 - 180
  • [26] Fast spatiotemporal MACH filter for action recognition
    Javed Ahmed
    Sadaf Abbasi
    M. Zakir Shaikh
    Machine Vision and Applications, 2013, 24 : 909 - 918
  • [27] Separable ConvNet Spatiotemporal Mixer for Action Recognition
    Cheng, Hsu-Yung
    Yu, Chih-Chang
    Li, Chenyu
    ELECTRONICS, 2024, 13 (03)
  • [28] Spatiotemporal Fusion Networks for Video Action Recognition
    Zheng Liu
    Haifeng Hu
    Junxuan Zhang
    Neural Processing Letters, 2019, 50 : 1877 - 1890
  • [29] Fast spatiotemporal MACH filter for action recognition
    Ahmed, Javed
    Abbasi, Sadaf
    Shaikh, M. Zakir
    MACHINE VISION AND APPLICATIONS, 2013, 24 (05) : 909 - 918
  • [30] Constructing Hierarchical Spatiotemporal Information for Action Recognition
    Yao, Guangle
    Zhong, Jiandan
    Lei, Tao
    Liu, Xianyuan
    2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 596 - 602