Action Recognition Using Deep 3D CNNs with Sequential Feature Aggregation and Attention

被引:8
|
作者
Anvarov, Fazliddin [1 ]
Kim, Dae Ha [1 ]
Song, Byung Cheol [1 ]
机构
[1] Inha Univ, Dept Elect Engn, Incheon 22212, South Korea
关键词
action recognition; 3D CNN; deep feature attention;
D O I
10.3390/electronics9010147
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Action recognition is an active research field that aims to recognize human actions and intentions from a series of observations of human behavior and the environment. Unlike image-based action recognition mainly using a two-dimensional (2D) convolutional neural network (CNN), one of the difficulties in video-based action recognition is that video action behavior should be able to characterize both short-term small movements and long-term temporal appearance information. Previous methods aim at analyzing video action behavior only using a basic framework of 3D CNN. However, these approaches have a limitation on analyzing fast action movements or abruptly appearing objects because of the limited coverage of convolutional filter. In this paper, we propose the aggregation of squeeze-and-excitation (SE) and self-attention (SA) modules with 3D CNN to analyze both short and long-term temporal action behavior efficiently. We successfully implemented SE and SA modules to present a novel approach to video action recognition that builds upon the current state-of-the-art methods and demonstrates better performance with UCF-101 and HMDB51 datasets. For example, we get accuracies of 92.5% (16f-clip) and 95.6% (64f-clip) with the UCF-101 dataset, and 68.1% (16f-clip) and 74.1% (64f-clip) with HMDB51 for the ResNext-101 architecture in a 3D CNN.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] A hybrid deep learning architecture using 3D CNNs and GRUs for human action recognition
    Savadi Hosseini, M.
    Ghaderi, F.
    [J]. International Journal of Engineering, Transactions B: Applications, 2020, 33 (05): : 959 - 965
  • [2] A Quad Joint Relational Feature for 3D Skeletal Action Recognition with Circular CNNs
    Kishore, P. V. V.
    Perera, Darshika G.
    Kumar, M. Tej A. Kiran
    Kumar, D. Anil
    Kumar, E. Kiran
    [J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [3] 3D CNNs on Distance Matrices for Human Action Recognition
    Hernandez Ruiz, Alejandro
    Porzi, Lorenzo
    Bulo, Samuel Rota
    Moreno-Noguer, Francesc
    [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1087 - 1095
  • [4] Emotion Recognition from 3D Motion Capture Data using Deep CNNs
    Zacharatos, Haris
    Gatzoulis, Christos
    Charalambous, Panayiotis
    Chrysanthou, Yiorgos
    [J]. 2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 886 - 890
  • [5] 3D Action Recognition Exploiting Hierarchical Deep Feature Fusion Model
    Thien Huynh-The
    Hua, Cam-Hao
    Nguyen Anh Tu
    Kim, Jae-Woo
    Kim, Seung-Hwan
    Kim, Dong-Seong
    [J]. PROCEEDINGS OF THE 2020 14TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM), 2020,
  • [6] Spatiotemporal Multimodal Learning With 3D CNNs for Video Action Recognition
    Wu, Hanbo
    Ma, Xin
    Li, Yibin
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1250 - 1261
  • [7] 3D Object Recognition Method Using CNNs and Slicing
    Dumitru, Razvan Gabriel
    Toma, Sebastian Antonio
    Gorgan, Dorian
    [J]. PROCEEDINGS OF 2022 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION, QUALITY AND TESTING, ROBOTICS (AQTR 2022), 2022, : 113 - 118
  • [8] 3D RANs: 3D Residual Attention Networks for action recognition
    Cai, Jiahui
    Hu, Jianguo
    [J]. VISUAL COMPUTER, 2020, 36 (06): : 1261 - 1270
  • [9] 3D RANs: 3D Residual Attention Networks for action recognition
    Jiahui Cai
    Jianguo Hu
    [J]. The Visual Computer, 2020, 36 : 1261 - 1270
  • [10] Learning Feature Aggregation for Deep 3D Morphable Models
    Chen, Zhixiang
    Kim, Tae-Kyun
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13159 - 13168