Static and Dynamic Concepts for Self-supervised Video Representation Learning

被引:10
|
作者
Qian, Rui [1 ]
Ding, Shuangrui [2 ]
Liu, Xian [1 ]
Lin, Dahua [1 ,3 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[3] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
来源
关键词
Video representation; Visual concepts; Local contrast;
D O I
10.1007/978-3-031-19809-0_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel learning scheme for self-supervised video representation learning. Motivated by how humans understand videos, we propose to first learn general visual concepts then attend to discriminative local areas for video understanding. Specifically, we utilize static frame and frame difference to help decouple static and dynamic concepts, and respectively align the concept distributions in latent space. We add diversity and fidelity regularizations to guarantee that we learn a compact set of meaningful concepts. Then we employ a cross-attention mechanism to aggregate detailed local features of different concepts, and filter out redundant concepts with low activations to perform local concept contrast. Extensive experiments demonstrate that our method distills meaningful static and dynamic concepts to guide video understanding, and obtains state-of-the-art results on UCF-101, HMDB-51, and Diving-48.
引用
收藏
页码:145 / 164
页数:20
相关论文
共 50 条
  • [1] Dynamic-boosting attention for self-supervised video representation learning
    Zhipeng Wang
    Chunping Hou
    Guanghui Yue
    Qingyuan Yang
    [J]. Applied Intelligence, 2022, 52 : 3143 - 3155
  • [2] SELF-SUPERVISED REPRESENTATION LEARNING FOR ULTRASOUND VIDEO
    Jiao, Jianbo
    Droste, Richard
    Drukker, Lior
    Papageorghiou, Aris T.
    Noble, J. Alison
    [J]. 2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 1847 - 1850
  • [3] Dynamic-boosting attention for self-supervised video representation learning
    Wang, Zhipeng
    Hou, Chunping
    Yue, Guanghui
    Yang, Qingyuan
    [J]. APPLIED INTELLIGENCE, 2022, 52 (03) : 3143 - 3155
  • [4] Self-supervised dynamic and static feature representation learning method for flotation monitoring
    Ai, Mingxi
    Xie, Yongfang
    Tang, Zhaohui
    Wu, Jiande
    Li, Peng
    Zhang, Jin
    [J]. POWDER TECHNOLOGY, 2024, 442
  • [5] Self-Supervised Video Representation Learning by Video Incoherence Detection
    Cao, Haozhi
    Xu, Yuecong
    Mao, Kezhi
    Xie, Lihua
    Yin, Jianxiong
    See, Simon
    Xu, Qianwen
    Yang, Jianfei
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (06) : 3810 - 3822
  • [6] Self-supervised Representation Learning on Dynamic Graphs
    Tian, Sheng
    Wu, Ruofan
    Shi, Leilei
    Zhu, Liang
    Xiong, Tao
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 1814 - 1823
  • [7] Video Face Clustering with Self-Supervised Representation Learning
    Sharma, Vivek
    Tapaswi, Makarand
    Saquib Sarfraz, M.
    Stiefelhagen, Rainer
    [J]. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020, 2 (02): : 145 - 157
  • [8] Self-Supervised Representation Learning for Video Quality Assessment
    Jiang, Shaojie
    Sang, Qingbing
    Hu, Zongyao
    Liu, Lixiong
    [J]. IEEE TRANSACTIONS ON BROADCASTING, 2023, 69 (01) : 118 - 129
  • [9] Video Motion Perception for Self-supervised Representation Learning
    Li, Wei
    Luo, Dezhao
    Fang, Bo
    Li, Xiaoni
    Zhou, Yu
    Wang, Weiping
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 508 - 520
  • [10] Self-supervised learning of Dynamic Representations for Static Images
    Song, Siyang
    Sanchez, Enrique
    Shen, Linlin
    Valstar, Michel
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1619 - 1626