SSAN: Separable Self-Attention Network for Video Representation Learning

被引:20
|
作者
Guo, Xudong [1 ,3 ]
Guo, Xun [2 ]
Lu, Yan [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] MSRA, Beijing, Peoples R China
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
关键词
D O I
10.1109/CVPR46437.2021.01243
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-attention has been successfully applied to video representation learning due to the effectiveness of modeling long range dependencies. Existing approaches build the dependencies merely by computing the pairwise correlations along spatial and temporal dimensions simultaneously. However; spatial correlations and temporal correlations represent different contextual information of scenes and temporal reasoning. Intuitively, learning spatial contextual information first will benefit temporal modeling. In this paper; we propose a separable self-attention (SSA) module, which models spatial and temporal correlations sequentially so that spatial contexts can be efficiently used in temporal modeling. By adding SSA module into 21) CNN, we build a SSA network (SSAN) for video representation learning. On the task of video action recognition, our approach outperforms state-of:the-art methods on Something Something and Kinetics-400 datasets. Our models often outperform counterparts with shallower network and fewer modalities. We further verify the semantic learning ability of our method in visual-language task of video retrieval, which showcases the homogeneity of video representations and text embeddings. On MSR-VIT and Youcook2 dowsers, video representations learnt by SSA significantly improve the state-of-the-art performance.
引用
收藏
页码:12613 / 12622
页数:10
相关论文
共 50 条
  • [21] A Novel Clothing Attribute Representation Network-Based Self-Attention Mechanism
    Chun, Yutong
    Wang, Chuansheng
    He, Mingke
    IEEE ACCESS, 2020, 8 (08): : 201762 - 201769
  • [22] Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection
    Gu, Yuchao
    Wang, Lijuan
    Wang, Ziqin
    Liu, Yun
    Cheng, Ming-Ming
    Lu, Shao-Ping
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10869 - 10876
  • [23] Self-attention Hypergraph Pooling Network
    Zhao Y.-F.
    Jin F.-S.
    Li R.-H.
    Qin H.-C.
    Cui P.
    Wang G.-R.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (10):
  • [24] Learning-based correspondence classifier with self-attention hierarchical network
    Mingfan Chu
    Yong Ma
    Xiaoguang Mei
    Jun Huang
    Fan Fan
    Applied Intelligence, 2023, 53 : 24360 - 24376
  • [25] A self-attention network for smoke detection
    Jiang, Minghua
    Zhao, Yaxin
    Yu, Feng
    Zhou, Changlong
    Peng, Tao
    FIRE SAFETY JOURNAL, 2022, 129
  • [26] Relevance, valence, and the self-attention network
    Mattan, Bradley D.
    Quinn, Kimberly A.
    Rotshtein, Pia
    COGNITIVE NEUROSCIENCE, 2016, 7 (1-4) : 27 - 28
  • [27] Learning-based correspondence classifier with self-attention hierarchical network
    Chu, Mingfan
    Ma, Yong
    Mei, Xiaoguang
    Huang, Jun
    Fan, Fan
    APPLIED INTELLIGENCE, 2023, 53 (20) : 24360 - 24376
  • [28] Self-representation and self-attention according to age.
    Bailly, N
    Alaphilippe, D
    ANNEE PSYCHOLOGIQUE, 2000, 100 (02): : 265 - 284
  • [29] Formula Graph Self-Attention Network for Representation-Domain Independent Materials Discovery
    Ihalage, Achintha
    Hao, Yang
    ADVANCED SCIENCE, 2022, 9 (18)
  • [30] Double-Stream Segmentation Network with Temporal Self-attention for Deepfake Video Detection
    Liu, Beibei
    Gao, Yifei
    Hu, Yongjian
    Guo, Jingjing
    Wang, Yufei
    DIGITAL FORENSICS AND WATERMARKING, IWDW 2021, 2022, 13180 : 32 - 46