Multi-Temporal Convolutions for Human Action Recognition in Videos

被引:3
|
作者
Stergiou, Alexandros [1 ]
Poppe, Ronald [1 ]
机构
[1] Univ Utrecht, Dept Informat & Comp Sci, Utrecht, Netherlands
关键词
TIME;
D O I
10.1109/IJCNN52387.2021.9533515
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Effective extraction of temporal patterns is crucial for the recognition of temporally varying actions in video. We argue that the fixed-sized spatio-temporal convolution kernels used in convolutional neural networks (CNNs) can be improved to extract informative motions that are executed at different time scales. To address this challenge, we present a novel convolution block that is capable of extracting spatio-temporal patterns at multiple temporal resolutions. Our proposed multi-temporal convolution (MTConv) blocks utilize two branches that focus on brief and prolonged spatio-temporal patterns, respectively. The extracted time-varying features are aligned in a third branch, with respect to global motion patterns through recurrent cells. The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture. This introduces a substantial reduction in computational costs. Extensive experiments on Kinetics, Moments in Time and HACS action recognition benchmark datasets demonstrate competitive performance of MTConvs compared to the state-of-the-art with a significantly lower computational footprint(1).
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Dilated Multi-Temporal Modeling for Action Recognition
    Zhang, Tao
    Wu, Yifan
    Li, Xiaoqiang
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (12):
  • [2] MULTI-TEMPORAL FOREGROUND DETECTION IN VIDEOS
    Tepper, Mariano
    Newson, Alasdair
    Sprechmann, Pablo
    Sapiro, Guillermo
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 4599 - 4603
  • [3] Statistical HOG on Multi-temporal Depth Motion Maps Approach for Human Action Recognition
    Ali, Heba Hamdy
    Youssif, Aliaa A. A.
    Moftah, Hossam M.
    [J]. PROCEEDINGS OF THE XX INTERNATIONAL CONFERENCE ON HUMAN-COMPUTER INTERACTION (INTERACCION'2019), 2019,
  • [4] A temporal belief filter improving human action recognition in videos
    Ramasso, Emmanuel
    Rombaut, Michele
    Pellerin, Denis
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1389 - 1392
  • [5] DenseGCN: A multi-level and multi-temporal graph convolutional network for action recognition
    Yu, Chengzhang
    Bao, Wenxia
    [J]. IET IMAGE PROCESSING, 2023, 17 (12) : 3401 - 3410
  • [6] Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos
    Duta, Ionut C.
    Ionescu, Bogdan
    Aizawa, Kiyoharu
    Sebe, Nicu
    [J]. MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 365 - 378
  • [7] Long-Term Temporal Convolutions for Action Recognition
    Varol, Gul
    Laptev, Ivan
    Schmid, Cordelia
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (06) : 1510 - 1517
  • [8] 3D ACTION RECOGNITION USING MULTI-TEMPORAL SKELETON VISUALIZATION
    Liu, Mengyuan
    Chen, Chen
    Meng, Fanyang
    Liu, Hong
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2017,
  • [9] Action Recognition Using Multi-Temporal DMMs Based on Adaptive Vague Division
    Jiang, Min
    Jin, Ke
    Kong, Jun
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON IMAGE AND GRAPHICS PROCESSING (ICIGP 2018), 2018, : 8 - 13
  • [10] Multi-view region-adaptive multi-temporal DMM and RGB action recognition
    Mahmoud Al-Faris
    John P. Chiverton
    Yanyan Yang
    David Ndzi
    [J]. Pattern Analysis and Applications, 2020, 23 : 1587 - 1602