Exploiting Attention-Consistency Loss For Spatial-Temporal Stream Action Recognition

被引:12
|
作者
Xu, Haotian [1 ]
Jin, Xiaobo [1 ]
Wang, Qiufeng [1 ]
Hussain, Amir [2 ]
Huang, Kaizhu [3 ]
机构
[1] Xian Jiaotong Liverpool Univ, 111 Renal Rd, Suzhou 215000, Jiangsu, Peoples R China
[2] Edinburgh Napier Univ, Edinburgh EH11 4BN, Midlothian, Scotland
[3] Duke Kunshan Univ, 8 Duke Ave, Kunshan 215316, Jiangsu, Peoples R China
关键词
Action recognition; attention consistency; multi-level attention; two-stream structure; FORM;
D O I
10.1145/3538749
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Currently, many action recognition methods mostly consider the information from spatial streams. We propose a new perspective inspired by the human visual system to combine both spatial and temporal streams to measure their attention consistency. Specifically, a branch-independent convolutional neural network (CNN) based algorithm is developed with a novel attention-consistency loss metric, enabling the temporal stream to concentrate on consistent discriminative regions with the spatial stream in the same period. The consistency loss is further combined with the cross-entropy loss to enhance the visual attention consistency. We evaluate the proposed method for action recognition on two benchmark datasets: Kinetics400 and UCF101. Despite its apparent simplicity, our proposed framework with the attention consistency achieves better performance than most of the two-stream networks, i.e., 75.7% top-1 accuracy on Kinetics400 and 95.7% on UCF101, while reducing 7.1% computational cost compared with our baseline. Particularly, our proposed method can attain remarkable improvements on complex action classes, showing that our proposed network can act as a potential benchmark to handle complicated scenarios in industry 4.0 applications.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Spatial-Temporal Attention for Action Recognition
    Sun, Dengdi
    Wu, Hanqing
    Ding, Zhuanlian
    Luo, Bin
    Tang, Jin
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 854 - 864
  • [2] Joint spatial-temporal attention for action recognition
    Yu, Tingzhao
    Guo, Chaoxu
    Wang, Lingfeng
    Gu, Huxiang
    Xiang, Shiming
    Pan, Chunhong
    PATTERN RECOGNITION LETTERS, 2018, 112 : 226 - 233
  • [3] Spatial-Temporal Convolutional Attention Network for Action Recognition
    Luo, Huilan
    Chen, Han
    Computer Engineering and Applications, 2023, 59 (09): : 150 - 158
  • [4] Spatial-Temporal Separable Attention for Video Action Recognition
    Guo, Xi
    Hu, Yikun
    Chen, Fang
    Jin, Yuhui
    Qiao, Jian
    Huang, Jian
    Yang, Qin
    2022 INTERNATIONAL CONFERENCE ON FRONTIERS OF ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, FAIML, 2022, : 224 - 228
  • [5] Select and Focus: Action Recognition with Spatial-Temporal Attention
    Chan, Wensong
    Tian, Zhiqiang
    Liu, Shuai
    Ren, Jing
    Lan, Xuguang
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2019, PT III, 2019, 11742 : 461 - 471
  • [6] Weakly supervised spatial-temporal attention network driven by tracking and consistency loss for action detection
    Zhu, Jinlei
    Chen, Houjin
    Pan, Pan
    Sun, Jia
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2022, 2022 (01)
  • [7] Spatial-temporal saliency action mask attention network for action recognition
    Jiang, Min
    Pan, Na
    Kong, Jun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 71
  • [8] Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos
    Du, Wenbin
    Wang, Yali
    Qiao, Yu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (03) : 1347 - 1360
  • [9] Spatial-temporal injection network: exploiting auxiliary losses for action recognition with apparent difference and self-attention
    Cao, Haiwen
    Wu, Chunlei
    Lu, Jing
    Wu, Jie
    Wang, Leiquan
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1173 - 1180
  • [10] Spatial-temporal channel-wise attention network for action recognition
    Chen, Lin
    Liu, Yungang
    Man, Yongchao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (14) : 21789 - 21808