Exploiting Attention-Consistency Loss For Spatial-Temporal Stream Action Recognition

被引:12
|
作者
Xu, Haotian [1 ]
Jin, Xiaobo [1 ]
Wang, Qiufeng [1 ]
Hussain, Amir [2 ]
Huang, Kaizhu [3 ]
机构
[1] Xian Jiaotong Liverpool Univ, 111 Renal Rd, Suzhou 215000, Jiangsu, Peoples R China
[2] Edinburgh Napier Univ, Edinburgh EH11 4BN, Midlothian, Scotland
[3] Duke Kunshan Univ, 8 Duke Ave, Kunshan 215316, Jiangsu, Peoples R China
关键词
Action recognition; attention consistency; multi-level attention; two-stream structure; FORM;
D O I
10.1145/3538749
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Currently, many action recognition methods mostly consider the information from spatial streams. We propose a new perspective inspired by the human visual system to combine both spatial and temporal streams to measure their attention consistency. Specifically, a branch-independent convolutional neural network (CNN) based algorithm is developed with a novel attention-consistency loss metric, enabling the temporal stream to concentrate on consistent discriminative regions with the spatial stream in the same period. The consistency loss is further combined with the cross-entropy loss to enhance the visual attention consistency. We evaluate the proposed method for action recognition on two benchmark datasets: Kinetics400 and UCF101. Despite its apparent simplicity, our proposed framework with the attention consistency achieves better performance than most of the two-stream networks, i.e., 75.7% top-1 accuracy on Kinetics400 and 95.7% on UCF101, while reducing 7.1% computational cost compared with our baseline. Particularly, our proposed method can attain remarkable improvements on complex action classes, showing that our proposed network can act as a potential benchmark to handle complicated scenarios in industry 4.0 applications.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video
    Yang, Guoan
    Yang, Yong
    Lu, Zhengzhi
    Yang, Junjie
    Liu, Deyang
    Zhou, Chuanbo
    Fan, Zien
    PLOS ONE, 2022, 17 (03):
  • [22] Spatial-temporal interaction learning based two-stream network for action recognition
    Liu, Tianyu
    Ma, Yujun
    Yang, Wenhan
    Ji, Wanting
    Wang, Ruili
    Jiang, Ping
    INFORMATION SCIENCES, 2022, 606 : 864 - 876
  • [23] Spatial-Temporal Neural Networks for Action Recognition
    Jing, Chao
    Wei, Ping
    Sun, Hongbin
    Zheng, Nanning
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 519 : 619 - 627
  • [24] Spatial-temporal pooling for action recognition in videos
    Wang, Jiaming
    Shao, Zhenfeng
    Huang, Xiao
    Lu, Tao
    Zhang, Ruiqian
    Lv, Xianwei
    NEUROCOMPUTING, 2021, 451 : 265 - 278
  • [25] Spatial-temporal interaction module for action recognition
    Luo, Hui-Lan
    Chen, Han
    Cheung, Yiu-Ming
    Yu, Yawei
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
  • [26] Spatial-Temporal gated graph attention network for skeleton-based action recognition
    Rahevar, Mrugendrasinh
    Ganatra, Amit
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 929 - 939
  • [27] Attention-based spatial-temporal hierarchical ConvLSTM network for action recognition in videos
    Xue, Fei
    Ji, Hongbing
    Zhang, Wenbo
    Cao, Yi
    IET COMPUTER VISION, 2019, 13 (08) : 708 - 718
  • [28] Extreme Low-Resolution Action Recognition with Confident Spatial-Temporal Attention Transfer
    Yucai Bai
    Qin Zou
    Xieyuanli Chen
    Lingxi Li
    Zhengming Ding
    Long Chen
    International Journal of Computer Vision, 2023, 131 : 1550 - 1565
  • [29] An Attention Enhanced Spatial-Temporal Graph Convolutional LSTM Network for Action Recognition in Karate
    Guo, Jianping
    Liu, Hong
    Li, Xi
    Xu, Dahong
    Zhang, Yihan
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [30] Extreme Low-Resolution Action Recognition with Confident Spatial-Temporal Attention Transfer
    Bai, Yucai
    Zou, Qin
    Chen, Xieyuanli
    Li, Lingxi
    Ding, Zhengming
    Chen, Long
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (06) : 1550 - 1565