Exploiting Attention-Consistency Loss For Spatial-Temporal Stream Action Recognition

被引：12

作者：

Xu, Haotian ^{[1
]}

Jin, Xiaobo ^{[1
]}

Wang, Qiufeng ^{[1
]}

Hussain, Amir ^{[2
]}

Huang, Kaizhu ^{[3
]}

机构：

[1] Xian Jiaotong Liverpool Univ, 111 Renal Rd, Suzhou 215000, Jiangsu, Peoples R China

[2] Edinburgh Napier Univ, Edinburgh EH11 4BN, Midlothian, Scotland

[3] Duke Kunshan Univ, 8 Duke Ave, Kunshan 215316, Jiangsu, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2022年 / 18卷 / 02期

关键词：

Action recognition; attention consistency; multi-level attention; two-stream structure; FORM;

D O I：

10.1145/3538749

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Currently, many action recognition methods mostly consider the information from spatial streams. We propose a new perspective inspired by the human visual system to combine both spatial and temporal streams to measure their attention consistency. Specifically, a branch-independent convolutional neural network (CNN) based algorithm is developed with a novel attention-consistency loss metric, enabling the temporal stream to concentrate on consistent discriminative regions with the spatial stream in the same period. The consistency loss is further combined with the cross-entropy loss to enhance the visual attention consistency. We evaluate the proposed method for action recognition on two benchmark datasets: Kinetics400 and UCF101. Despite its apparent simplicity, our proposed framework with the attention consistency achieves better performance than most of the two-stream networks, i.e., 75.7% top-1 accuracy on Kinetics400 and 95.7% on UCF101, while reducing 7.1% computational cost compared with our baseline. Particularly, our proposed method can attain remarkable improvements on complex action classes, showing that our proposed network can act as a potential benchmark to handle complicated scenarios in industry 4.0 applications.

引用

页数：15

共 50 条

[31] Spatial-Temporal Dynamic Graph Attention Network for Skeleton-Based Action Recognition
Rahevar, Mrugendrasinh
Ganatra, Amit
Saba, Tanzila
Rehman, Amjad
Bahaj, Saeed Ali
IEEE ACCESS, 2023, 11 : 21546 - 21553
[32] Streamer action recognition in live video with spatial-temporal attention and deep dictionary learning
Li, Chenhao
Zhang, Jing
Yao, Jiacheng
NEUROCOMPUTING, 2021, 453 : 383 - 392
[33] Activity Recognition Based on Spatial-Temporal Attention LSTM
Xie, Zhao
Zhou, Yi
Wu, Ke-Wei
Zhang, Shun-Ran
Jisuanji Xuebao/Chinese Journal of Computers, 2021, 44 (02): : 261 - 274
[34] Two-stream spatial-temporal neural networks for pose-based action recognition
Wang, Zixuan
Zhu, Aichun
Hu, Fangqiang
Wu, Qianyu
Li, Yifeng
JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (04)
[35] Action Recognition by Joint Spatial-Temporal Motion Feature
Zhang, Weihua
Zhang, Yi
Gao, Chaobang
Zhou, Jiliu
JOURNAL OF APPLIED MATHEMATICS, 2013,
[36] Spatial-Temporal Pyramid Graph Reasoning for Action Recognition
Geng, Tiantian
Zheng, Feng
Hou, Xiaorong
Lu, Ke
Qi, Guo-Jun
Shao, Ling
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5484 - 5497
[37] Action recognition with spatial-temporal discriminative filter banks
Martinez, Brais
Modolo, Davide
Xiong, Yuanjun
Tighe, Joseph
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5481 - 5490
[38] Grouped Spatial-Temporal Aggregation for Efficient Action Recognition
Luo, Chenxu
Yuille, Alan
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5511 - 5520
[39] Spatial-Temporal Interleaved Network for Efficient Action Recognition
Jiang, Shengqin
Zhang, Haokui
Qi, Yuankai
Liu, Qingshan
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2025, 21 (01) : 178 - 187
[40] Weakly supervised spatial–temporal attention network driven by tracking and consistency loss for action detection
Jinlei Zhu
Houjin Chen
Pan Pan
Jia Sun
EURASIP Journal on Image and Video Processing, 2022

← 1 2 3 4 5 →