Spatial–temporal injection network: exploiting auxiliary losses for action recognition with apparent difference and self-attention

被引:0
|
作者
Haiwen Cao
Chunlei Wu
Jing Lu
Jie Wu
Leiquan Wang
机构
[1] China University of Petroleum,College of Computer Science and Technology
来源
关键词
Action recognition; Apparent difference module; Self-attention mechanism; Spatiotemporal Features;
D O I
暂无
中图分类号
学科分类号
摘要
Two-stream convolutional networks have shown strong performance in action recognition. However, both spatial and temporal features in two-stream are learned separately. There has been almost no consideration for the different characteristics of the spatial and temporal streams, which are performed on the same operations. In this paper, we build upon two-stream convolutional networks and propose a novel spatial–temporal injection network (STIN) with two different auxiliary losses. To build spatial–temporal features as the video representation, the apparent difference module is designed to model the auxiliary temporal constraints on spatial features in spatial injection network. The self-attention mechanism is used to attend to the interested areas in the temporal injection stream, which reduces the optical flow noise influence of uninterested region. Then, these auxiliary losses enable efficient training of two complementary streams which can capture interactions between the spatial and temporal information from different perspectives. Experiments conducted on the two well-known datasets—UCF101 and HMDB51—demonstrate the effectiveness of the proposed STIN.
引用
收藏
页码:1173 / 1180
页数:7
相关论文
共 50 条
  • [1] Spatial-temporal injection network: exploiting auxiliary losses for action recognition with apparent difference and self-attention
    Cao, Haiwen
    Wu, Chunlei
    Lu, Jing
    Wu, Jie
    Wang, Leiquan
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1173 - 1180
  • [2] SPATIO-TEMPORAL SLOWFAST SELF-ATTENTION NETWORK FOR ACTION RECOGNITION
    Kim, Myeongjun
    Kim, Taehun
    Kim, Daijin
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2206 - 2210
  • [3] Multimodal cooperative self-attention network for action recognition
    Zhong, Zhuokun
    Hou, Zhenjie
    Liang, Jiuzhen
    Lin, En
    Shi, Haiyong
    IET IMAGE PROCESSING, 2023, 17 (06) : 1775 - 1783
  • [4] Spatio-Temporal Self-Attention Weighted VLAD Neural Network for Action Recognition
    Cheng, Shilei
    Xie, Mei
    Ma, Zheng
    Li, Siqi
    Gu, Song
    Yang, Feng
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (01) : 220 - 224
  • [5] Spatial-Temporal Action Localization With Hierarchical Self-Attention
    Pramono, Rizard Renanda Adhi
    Chen, Yie-Tarng
    Fang, Wen-Hsien
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 625 - 639
  • [6] Self-Attention Pooling-Based Long-Term Temporal Network for Action Recognition
    Li, Huifang
    Huang, Jingwei
    Zhou, Mengchu
    Shi, Qisong
    Fei, Qing
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (01) : 65 - 77
  • [7] Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
    Xiang, Wangmeng
    Li, Chao
    Wang, Biao
    Wei, Xihan
    Hua, Xian-Sheng
    Zhang, Lei
    COMPUTER VISION - ECCV 2022, PT III, 2022, 13663 : 627 - 644
  • [8] Spatial-Temporal Self-Attention Enhanced Graph Convolutional Networks for Fitness Yoga Action Recognition
    Wei, Guixiang
    Zhou, Huijian
    Zhang, Liping
    Wang, Jianji
    SENSORS, 2023, 23 (10)
  • [9] A Spatial-Temporal Self-Attention Network (STSAN) for Location Prediction
    Wang, Shuang
    Li, AnLiang
    Xie, Shuai
    Li, WenZhu
    Wang, BoWei
    Yao, Shuai
    Asif, Muhammad
    COMPLEXITY, 2021, 2021
  • [10] An efficient self-attention network for skeleton-based action recognition
    Xiaofei Qin
    Rui Cai
    Jiabin Yu
    Changxiang He
    Xuedian Zhang
    Scientific Reports, 12 (1)