Metric-Based Attention Feature Learning for Video Action Recognition

被引:9
|
作者
Kim, Dae Ha [1 ]
Anvarov, Fazliddin [1 ]
Lee, Jun Min [1 ]
Song, Byung Cheol [1 ]
机构
[1] Inha Univ, Dept Elect & Comp Engn, Incheon 22212, South Korea
关键词
Feature extraction; Measurement; Three-dimensional displays; Task analysis; Two dimensional displays; Licenses; Kernel; Body action recognition; 3D CNN; attention map learning; distance metric learning;
D O I
10.1109/ACCESS.2021.3064934
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Conventional approaches for video action recognition were designed to learn feature maps using 3D convolutional neural networks (CNNs). For better action recognition, they trained the large-scale video datasets with the representation power of 3D CNN. However, action recognition is still a challenging task. Since the previous methods rarely distinguish human body from environment, they often overfit background scenes. Note that separating human body from background allows to learn distinct representations of human action. This paper proposes a novel attention module aiming at only action part(s), while neglecting non-action part(s) such as background. First, the attention module employs triplet loss to differentiate active features from non-active or less active features. Second, two attention modules based on spatial and channel domains are proposed to enhance the feature representation ability for action recognition. The spatial attention module is to learn spatial correlation of features, and the channel attention module is to learn channel correlation. Experimental results show that the proposed method achieves state-of-the-art performance of 41.41% and 55.21% on Diving48 and Something-V1 datasets, respectively. In addition, the proposed method provides competitive performance even on UCF101 and HMDB-51 datasets, i.e., 95.83% on UCF-101 and 74.33% on HMDB-51.
引用
收藏
页码:39218 / 39228
页数:11
相关论文
共 50 条
  • [31] Human Action Recognition Based on Motion Feature and Manifold Learning
    Wang, Jun
    Xia, Limin
    Ma, Wentao
    [J]. IEEE ACCESS, 2021, 9 : 89287 - 89299
  • [32] A deep learning method for video-based action recognition
    Zhang, Guanwen
    Rao, Yukun
    Wang, Changhao
    Zhou, Wei
    Ji, Xiangyang
    [J]. IET IMAGE PROCESSING, 2021, 15 (14) : 3498 - 3511
  • [33] Metric-based Topology Investigation
    Bohdanowicz, F.
    Dickel, H.
    Steigner, Ch.
    [J]. 2009 EIGHTH INTERNATIONAL CONFERENCE ON NETWORKS, 2009, : 176 - 184
  • [34] LPI Radar Signal Recognition Based on Feature Enhancement with Deep Metric Learning
    Ren, Feitao
    Quan, Daying
    Shen, Lai
    Wang, Xiaofeng
    Zhang, Dongping
    Liu, Hengliang
    [J]. ELECTRONICS, 2023, 12 (24)
  • [35] Two-Level Attention Model Based Video Action Recognition Network
    Sang, Haifeng
    Zhao, Ziyu
    He, Dakuo
    [J]. IEEE ACCESS, 2019, 7 : 118388 - 118401
  • [36] Video Action Recognition Based on Spatio-temporal Feature Pyramid Module
    Gong, Suming
    Chen, Ying
    [J]. 2020 13TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2020), 2020, : 338 - 341
  • [37] Embedding metric learning into set-based face recognition for video surveillance
    Wang, Guijin
    Zheng, Fei
    Shi, Chenbo
    Xue, Jing-Hao
    Liu, Chunxiao
    He, Li
    [J]. NEUROCOMPUTING, 2015, 151 : 1500 - 1506
  • [38] Projection Metric Learning on Grassmann Manifold with Application to Video based Face Recognition
    Huang, Zhiwu
    Wang, Ruiping
    Shan, Shiguang
    Chen, Xilin
    [J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 140 - 149
  • [39] Residual attention fusion network for video action recognition
    Li, Ao
    Yi, Yang
    Liang, Daan
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
  • [40] Shrinking Temporal Attention in Transformers for Video Action Recognition
    Li, Bonan
    Xiong, Pengfei
    Han, Congying
    Guo, Tiande
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1263 - 1271