Action Recognition Using Deep 3D CNNs with Sequential Feature Aggregation and Attention

被引：8

作者：

Anvarov, Fazliddin ^{[1
]}

Kim, Dae Ha ^{[1
]}

Song, Byung Cheol ^{[1
]}

机构：

[1] Inha Univ, Dept Elect Engn, Incheon 22212, South Korea

来源：

ELECTRONICS | 2020年 / 9卷 / 01期

关键词：

action recognition; 3D CNN; deep feature attention;

D O I：

10.3390/electronics9010147

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Action recognition is an active research field that aims to recognize human actions and intentions from a series of observations of human behavior and the environment. Unlike image-based action recognition mainly using a two-dimensional (2D) convolutional neural network (CNN), one of the difficulties in video-based action recognition is that video action behavior should be able to characterize both short-term small movements and long-term temporal appearance information. Previous methods aim at analyzing video action behavior only using a basic framework of 3D CNN. However, these approaches have a limitation on analyzing fast action movements or abruptly appearing objects because of the limited coverage of convolutional filter. In this paper, we propose the aggregation of squeeze-and-excitation (SE) and self-attention (SA) modules with 3D CNN to analyze both short and long-term temporal action behavior efficiently. We successfully implemented SE and SA modules to present a novel approach to video action recognition that builds upon the current state-of-the-art methods and demonstrates better performance with UCF-101 and HMDB51 datasets. For example, we get accuracies of 92.5% (16f-clip) and 95.6% (64f-clip) with the UCF-101 dataset, and 68.1% (16f-clip) and 74.1% (64f-clip) with HMDB51 for the ResNext-101 architecture in a 3D CNN.

引用

页数：15

共 50 条

[1] A hybrid deep learning architecture using 3D CNNs and GRUs for human action recognition
Savadi Hosseini, M.
Ghaderi, F.
[J]. International Journal of Engineering, Transactions B: Applications, 2020, 33 (05): : 959 - 965
[2] A Quad Joint Relational Feature for 3D Skeletal Action Recognition with Circular CNNs
Kishore, P. V. V.
Perera, Darshika G.
Kumar, M. Tej A. Kiran
Kumar, D. Anil
Kumar, E. Kiran
[J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
[3] 3D CNNs on Distance Matrices for Human Action Recognition
Hernandez Ruiz, Alejandro
Porzi, Lorenzo
Bulo, Samuel Rota
Moreno-Noguer, Francesc
[J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1087 - 1095
[4] Emotion Recognition from 3D Motion Capture Data using Deep CNNs
Zacharatos, Haris
Gatzoulis, Christos
Charalambous, Panayiotis
Chrysanthou, Yiorgos
[J]. 2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 886 - 890
[5] 3D Action Recognition Exploiting Hierarchical Deep Feature Fusion Model
Thien Huynh-The
Hua, Cam-Hao
Nguyen Anh Tu
Kim, Jae-Woo
Kim, Seung-Hwan
Kim, Dong-Seong
[J]. PROCEEDINGS OF THE 2020 14TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM), 2020,
[6] Spatiotemporal Multimodal Learning With 3D CNNs for Video Action Recognition
Wu, Hanbo
Ma, Xin
Li, Yibin
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1250 - 1261
[7] 3D Object Recognition Method Using CNNs and Slicing
Dumitru, Razvan Gabriel
Toma, Sebastian Antonio
Gorgan, Dorian
[J]. PROCEEDINGS OF 2022 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION, QUALITY AND TESTING, ROBOTICS (AQTR 2022), 2022, : 113 - 118
[8] 3D RANs: 3D Residual Attention Networks for action recognition
Cai, Jiahui
Hu, Jianguo
[J]. VISUAL COMPUTER, 2020, 36 (06): : 1261 - 1270
[9] 3D RANs: 3D Residual Attention Networks for action recognition
Jiahui Cai
Jianguo Hu
[J]. The Visual Computer, 2020, 36 : 1261 - 1270
[10] Learning Feature Aggregation for Deep 3D Morphable Models
Chen, Zhixiang
Kim, Tae-Kyun
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13159 - 13168

← 1 2 3 4 5 →