Exploiting Attention-Consistency Loss For Spatial-Temporal Stream Action Recognition

被引:12
|
作者
Xu, Haotian [1 ]
Jin, Xiaobo [1 ]
Wang, Qiufeng [1 ]
Hussain, Amir [2 ]
Huang, Kaizhu [3 ]
机构
[1] Xian Jiaotong Liverpool Univ, 111 Renal Rd, Suzhou 215000, Jiangsu, Peoples R China
[2] Edinburgh Napier Univ, Edinburgh EH11 4BN, Midlothian, Scotland
[3] Duke Kunshan Univ, 8 Duke Ave, Kunshan 215316, Jiangsu, Peoples R China
关键词
Action recognition; attention consistency; multi-level attention; two-stream structure; FORM;
D O I
10.1145/3538749
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Currently, many action recognition methods mostly consider the information from spatial streams. We propose a new perspective inspired by the human visual system to combine both spatial and temporal streams to measure their attention consistency. Specifically, a branch-independent convolutional neural network (CNN) based algorithm is developed with a novel attention-consistency loss metric, enabling the temporal stream to concentrate on consistent discriminative regions with the spatial stream in the same period. The consistency loss is further combined with the cross-entropy loss to enhance the visual attention consistency. We evaluate the proposed method for action recognition on two benchmark datasets: Kinetics400 and UCF101. Despite its apparent simplicity, our proposed framework with the attention consistency achieves better performance than most of the two-stream networks, i.e., 75.7% top-1 accuracy on Kinetics400 and 95.7% on UCF101, while reducing 7.1% computational cost compared with our baseline. Particularly, our proposed method can attain remarkable improvements on complex action classes, showing that our proposed network can act as a potential benchmark to handle complicated scenarios in industry 4.0 applications.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Spatial-Temporal Dynamic Graph Attention Network for Skeleton-Based Action Recognition
    Rahevar, Mrugendrasinh
    Ganatra, Amit
    Saba, Tanzila
    Rehman, Amjad
    Bahaj, Saeed Ali
    IEEE ACCESS, 2023, 11 : 21546 - 21553
  • [32] Streamer action recognition in live video with spatial-temporal attention and deep dictionary learning
    Li, Chenhao
    Zhang, Jing
    Yao, Jiacheng
    NEUROCOMPUTING, 2021, 453 : 383 - 392
  • [33] Activity Recognition Based on Spatial-Temporal Attention LSTM
    Xie, Zhao
    Zhou, Yi
    Wu, Ke-Wei
    Zhang, Shun-Ran
    Jisuanji Xuebao/Chinese Journal of Computers, 2021, 44 (02): : 261 - 274
  • [34] Two-stream spatial-temporal neural networks for pose-based action recognition
    Wang, Zixuan
    Zhu, Aichun
    Hu, Fangqiang
    Wu, Qianyu
    Li, Yifeng
    JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (04)
  • [35] Action Recognition by Joint Spatial-Temporal Motion Feature
    Zhang, Weihua
    Zhang, Yi
    Gao, Chaobang
    Zhou, Jiliu
    JOURNAL OF APPLIED MATHEMATICS, 2013,
  • [36] Spatial-Temporal Pyramid Graph Reasoning for Action Recognition
    Geng, Tiantian
    Zheng, Feng
    Hou, Xiaorong
    Lu, Ke
    Qi, Guo-Jun
    Shao, Ling
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5484 - 5497
  • [37] Action recognition with spatial-temporal discriminative filter banks
    Martinez, Brais
    Modolo, Davide
    Xiong, Yuanjun
    Tighe, Joseph
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5481 - 5490
  • [38] Grouped Spatial-Temporal Aggregation for Efficient Action Recognition
    Luo, Chenxu
    Yuille, Alan
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5511 - 5520
  • [39] Spatial-Temporal Interleaved Network for Efficient Action Recognition
    Jiang, Shengqin
    Zhang, Haokui
    Qi, Yuankai
    Liu, Qingshan
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2025, 21 (01) : 178 - 187
  • [40] Weakly supervised spatial–temporal attention network driven by tracking and consistency loss for action detection
    Jinlei Zhu
    Houjin Chen
    Pan Pan
    Jia Sun
    EURASIP Journal on Image and Video Processing, 2022