Spatial–temporal injection network: exploiting auxiliary losses for action recognition with apparent difference and self-attention

被引:0
|
作者
Haiwen Cao
Chunlei Wu
Jing Lu
Jie Wu
Leiquan Wang
机构
[1] China University of Petroleum,College of Computer Science and Technology
来源
关键词
Action recognition; Apparent difference module; Self-attention mechanism; Spatiotemporal Features;
D O I
暂无
中图分类号
学科分类号
摘要
Two-stream convolutional networks have shown strong performance in action recognition. However, both spatial and temporal features in two-stream are learned separately. There has been almost no consideration for the different characteristics of the spatial and temporal streams, which are performed on the same operations. In this paper, we build upon two-stream convolutional networks and propose a novel spatial–temporal injection network (STIN) with two different auxiliary losses. To build spatial–temporal features as the video representation, the apparent difference module is designed to model the auxiliary temporal constraints on spatial features in spatial injection network. The self-attention mechanism is used to attend to the interested areas in the temporal injection stream, which reduces the optical flow noise influence of uninterested region. Then, these auxiliary losses enable efficient training of two complementary streams which can capture interactions between the spatial and temporal information from different perspectives. Experiments conducted on the two well-known datasets—UCF101 and HMDB51—demonstrate the effectiveness of the proposed STIN.
引用
收藏
页码:1173 / 1180
页数:7
相关论文
共 50 条
  • [31] A visual self-attention network for facial expression recognition
    Yu, Naigong
    Bai, Deguo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [32] Spatio-Temporal 3D Action Recognition with Hierarchical Self-Attention Mechanism
    Araei, Soheil
    Nadian-Ghomsheh, Ali
    2021 26TH INTERNATIONAL COMPUTER CONFERENCE, COMPUTER SOCIETY OF IRAN (CSICC), 2021,
  • [33] Hierarchical Self-Attention Network for Action Localization in Videos
    Pramono, Rizard Renanda Adhi
    Chen, Yie-Tarng
    Fang, Wen-Hsien
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 61 - 70
  • [34] Spatial-temporal channel-wise attention network for action recognition
    Chen, Lin
    Liu, Yungang
    Man, Yongchao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (14) : 21789 - 21808
  • [35] Recurrent attention network using spatial-temporal relations for action recognition
    Zhang, Mingxing
    Yang, Yang
    Ji, Yanli
    Xie, Ning
    Shen, Fumin
    SIGNAL PROCESSING, 2018, 145 : 137 - 145
  • [36] A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention
    Yang, Qi
    Lu, Tongwei
    Zhou, Huabing
    ENTROPY, 2022, 24 (03)
  • [37] Spatial-temporal channel-wise attention network for action recognition
    Lin Chen
    Yungang Liu
    Yongchao Man
    Multimedia Tools and Applications, 2021, 80 : 21789 - 21808
  • [38] Transforming spatio-temporal self-attention using action embedding for skeleton-based action recognition
    Ahmad, Tasweer
    Rizvi, Syed Tahir Hussain
    Kanwal, Neel
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
  • [39] SELF-ATTENTION NETWORKS FOR CONNECTIONIST TEMPORAL CLASSIFICATION IN SPEECH RECOGNITION
    Salazar, Julian
    Kirchhoff, Katrin
    Huang, Zhiheng
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7115 - 7119
  • [40] Global Temporal Difference Network for Action Recognition
    Xie, Zhao
    Chen, Jiansong
    Wu, Kewei
    Guo, Dan
    Hong, Richang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7594 - 7606