SSAN: Separable Self-Attention Network for Video Representation Learning

被引：20

作者：

Guo, Xudong ^{[1
,3
]}

Guo, Xun ^{[2
]}

Lu, Yan ^{[2
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

[3] MSRA, Beijing, Peoples R China

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

关键词：

D O I：

10.1109/CVPR46437.2021.01243

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Self-attention has been successfully applied to video representation learning due to the effectiveness of modeling long range dependencies. Existing approaches build the dependencies merely by computing the pairwise correlations along spatial and temporal dimensions simultaneously. However; spatial correlations and temporal correlations represent different contextual information of scenes and temporal reasoning. Intuitively, learning spatial contextual information first will benefit temporal modeling. In this paper; we propose a separable self-attention (SSA) module, which models spatial and temporal correlations sequentially so that spatial contexts can be efficiently used in temporal modeling. By adding SSA module into 21) CNN, we build a SSA network (SSAN) for video representation learning. On the task of video action recognition, our approach outperforms state-of:the-art methods on Something Something and Kinetics-400 datasets. Our models often outperform counterparts with shallower network and fewer modalities. We further verify the semantic learning ability of our method in visual-language task of video retrieval, which showcases the homogeneity of video representations and text embeddings. On MSR-VIT and Youcook2 dowsers, video representations learnt by SSA significantly improve the state-of-the-art performance.

引用

页码：12613 / 12622

页数：10

共 50 条

[21] A Novel Clothing Attribute Representation Network-Based Self-Attention Mechanism
Chun, Yutong
Wang, Chuansheng
He, Mingke
IEEE ACCESS, 2020, 8 (08): : 201762 - 201769
[22] Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection
Gu, Yuchao
Wang, Lijuan
Wang, Ziqin
Liu, Yun
Cheng, Ming-Ming
Lu, Shao-Ping
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10869 - 10876
[23] Self-attention Hypergraph Pooling Network
Zhao Y.-F.
Jin F.-S.
Li R.-H.
Qin H.-C.
Cui P.
Wang G.-R.
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (10):
[24] Learning-based correspondence classifier with self-attention hierarchical network
Mingfan Chu
Yong Ma
Xiaoguang Mei
Jun Huang
Fan Fan
Applied Intelligence, 2023, 53 : 24360 - 24376
[25] A self-attention network for smoke detection
Jiang, Minghua
Zhao, Yaxin
Yu, Feng
Zhou, Changlong
Peng, Tao
FIRE SAFETY JOURNAL, 2022, 129
[26] Relevance, valence, and the self-attention network
Mattan, Bradley D.
Quinn, Kimberly A.
Rotshtein, Pia
COGNITIVE NEUROSCIENCE, 2016, 7 (1-4) : 27 - 28
[27] Learning-based correspondence classifier with self-attention hierarchical network
Chu, Mingfan
Ma, Yong
Mei, Xiaoguang
Huang, Jun
Fan, Fan
APPLIED INTELLIGENCE, 2023, 53 (20) : 24360 - 24376
[28] Self-representation and self-attention according to age.
Bailly, N
Alaphilippe, D
ANNEE PSYCHOLOGIQUE, 2000, 100 (02): : 265 - 284
[29] Formula Graph Self-Attention Network for Representation-Domain Independent Materials Discovery
Ihalage, Achintha
Hao, Yang
ADVANCED SCIENCE, 2022, 9 (18)
[30] Double-Stream Segmentation Network with Temporal Self-attention for Deepfake Video Detection
Liu, Beibei
Gao, Yifei
Hu, Yongjian
Guo, Jingjing
Wang, Yufei
DIGITAL FORENSICS AND WATERMARKING, IWDW 2021, 2022, 13180 : 32 - 46

← 1 2 3 4 5 →