Metric-Based Attention Feature Learning for Video Action Recognition

被引:10
|
作者
Kim, Dae Ha [1 ]
Anvarov, Fazliddin [1 ]
Lee, Jun Min [1 ]
Song, Byung Cheol [1 ]
机构
[1] Inha Univ, Dept Elect & Comp Engn, Incheon 22212, South Korea
来源
IEEE ACCESS | 2021年 / 9卷
关键词
Feature extraction; Measurement; Three-dimensional displays; Task analysis; Two dimensional displays; Licenses; Kernel; Body action recognition; 3D CNN; attention map learning; distance metric learning;
D O I
10.1109/ACCESS.2021.3064934
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Conventional approaches for video action recognition were designed to learn feature maps using 3D convolutional neural networks (CNNs). For better action recognition, they trained the large-scale video datasets with the representation power of 3D CNN. However, action recognition is still a challenging task. Since the previous methods rarely distinguish human body from environment, they often overfit background scenes. Note that separating human body from background allows to learn distinct representations of human action. This paper proposes a novel attention module aiming at only action part(s), while neglecting non-action part(s) such as background. First, the attention module employs triplet loss to differentiate active features from non-active or less active features. Second, two attention modules based on spatial and channel domains are proposed to enhance the feature representation ability for action recognition. The spatial attention module is to learn spatial correlation of features, and the channel attention module is to learn channel correlation. Experimental results show that the proposed method achieves state-of-the-art performance of 41.41% and 55.21% on Diving48 and Something-V1 datasets, respectively. In addition, the proposed method provides competitive performance even on UCF101 and HMDB-51 datasets, i.e., 95.83% on UCF-101 and 74.33% on HMDB-51.
引用
收藏
页码:39218 / 39228
页数:11
相关论文
共 50 条
  • [21] Slow feature subspace: A video representation based on slow feature analysis for action recognition
    Beleza, Suzana Rita Alves
    Shimomoto, Erica K.
    Souza, Lincon S.
    Fukui, Kazuhiro
    MACHINE LEARNING WITH APPLICATIONS, 2023, 14
  • [22] Classification of endoscopic image and video frames using distance metric-based learning with interpolated latent features
    Fatemeh Sedighipour Chafjiri
    Mohammad Reza Mohebbian
    Khan A. Wahid
    Paul Babyn
    Multimedia Tools and Applications, 2023, 82 : 36577 - 36598
  • [23] Spatio-Temporal Feature Extraction and Distance Metric Learning for Unconstrained Action Recognition
    Yoon, Yongsang
    Yu, Jongmin
    Jeon, Moongu
    2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
  • [24] An attention mechanism based convolutional LSTM network for video action recognition
    Ge, Hongwei
    Yan, Zehang
    Yu, Wenhao
    Sun, Liang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (14) : 20533 - 20556
  • [25] Video action recognition method based on attention residual network and LSTM
    Zhang, Yu
    Dong, Pengyue
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 3611 - 3616
  • [26] An attention mechanism based convolutional LSTM network for video action recognition
    Hongwei Ge
    Zehang Yan
    Wenhao Yu
    Liang Sun
    Multimedia Tools and Applications, 2019, 78 : 20533 - 20556
  • [27] CANet: Comprehensive Attention Network for video-based action recognition
    Gao, Xiong
    Chang, Zhaobin
    Ran, Xingcheng
    Lu, Yonggang
    KNOWLEDGE-BASED SYSTEMS, 2024, 296
  • [28] Deep Local Video Feature for Action Recognition
    Lan, Zhenzhong
    Zhu, Yi
    Hauptmann, Alexander G.
    Newsam, Shawn
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 1219 - 1225
  • [29] Metric-based inductive learning using semantic height functions
    Markov, Z
    Marinchev, I
    MACHINE LEARNING: ECML 2000, 2000, 1810 : 254 - 262
  • [30] Navigating the face recognition: unleashing the power of few-shot learning through metric-based insights
    Jain, Sushant
    Pundir, Amit
    Singh, Sanjeev
    Saxena, Geetika Jain
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (33) : 79939 - 79961