Metric-Based Attention Feature Learning for Video Action Recognition

被引:10
|
作者
Kim, Dae Ha [1 ]
Anvarov, Fazliddin [1 ]
Lee, Jun Min [1 ]
Song, Byung Cheol [1 ]
机构
[1] Inha Univ, Dept Elect & Comp Engn, Incheon 22212, South Korea
来源
IEEE ACCESS | 2021年 / 9卷
关键词
Feature extraction; Measurement; Three-dimensional displays; Task analysis; Two dimensional displays; Licenses; Kernel; Body action recognition; 3D CNN; attention map learning; distance metric learning;
D O I
10.1109/ACCESS.2021.3064934
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Conventional approaches for video action recognition were designed to learn feature maps using 3D convolutional neural networks (CNNs). For better action recognition, they trained the large-scale video datasets with the representation power of 3D CNN. However, action recognition is still a challenging task. Since the previous methods rarely distinguish human body from environment, they often overfit background scenes. Note that separating human body from background allows to learn distinct representations of human action. This paper proposes a novel attention module aiming at only action part(s), while neglecting non-action part(s) such as background. First, the attention module employs triplet loss to differentiate active features from non-active or less active features. Second, two attention modules based on spatial and channel domains are proposed to enhance the feature representation ability for action recognition. The spatial attention module is to learn spatial correlation of features, and the channel attention module is to learn channel correlation. Experimental results show that the proposed method achieves state-of-the-art performance of 41.41% and 55.21% on Diving48 and Something-V1 datasets, respectively. In addition, the proposed method provides competitive performance even on UCF101 and HMDB-51 datasets, i.e., 95.83% on UCF-101 and 74.33% on HMDB-51.
引用
收藏
页码:39218 / 39228
页数:11
相关论文
共 50 条
  • [41] A Video Action Recognition Method via Dual-Stream Feature Fusion Neural Network with Attention
    Han, Jianmin
    Li, Jie
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2024, 32 (04) : 673 - 694
  • [42] Residual attention fusion network for video action recognition
    Li, Ao
    Yi, Yang
    Liang, Daan
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
  • [43] Shrinking Temporal Attention in Transformers for Video Action Recognition
    Li, Bonan
    Xiong, Pengfei
    Han, Congying
    Guo, Tiande
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1263 - 1271
  • [44] Integrating Temporal and Spatial Attention for Video Action Recognition
    Zhou, Yuanding
    Li, Baopu
    Wang, Zhihui
    Li, Haojie
    SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [45] LDAnet:a discriminant subspace for metric-based few-shot learning
    Chen, Dalei
    Liu, Bao-Di
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 1075 - 1080
  • [46] Motion Feature Combination for Human Action Recognition in Video
    Meng, Hongying
    Pears, Nick
    Bailey, Chris
    COMPUTER VISION AND COMPUTER GRAPHICS, 2008, 21 : 151 - +
  • [47] Feature difference and feature correlation learning mechanism for skeleton-based action recognition
    Qing, Ruxin
    Jiang, Min
    Kong, Jun
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (01)
  • [48] Attention-Based Deep Metric Learning for Near-Duplicate Video Retrieval
    Wang, Kuan-Hsun
    Cheng, Chia-Chun
    Chen, Yi-Ling
    Song, Yale
    Lai, Shang-Hong
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5360 - 5367
  • [49] Metric-Based Auto-Instructor for Learning Mixed Data Representation
    Jian, Songlei
    Hu, Liang
    Cao, Longbing
    Lu, Kai
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3318 - 3325
  • [50] Distance metric-based learning for long-tail object detection
    Shao, Mingwen
    Peng, Zilu
    IMAGE AND VISION COMPUTING, 2024, 142