Static and Dynamic Concepts for Self-supervised Video Representation Learning

被引:10
|
作者
Qian, Rui [1 ]
Ding, Shuangrui [2 ]
Liu, Xian [1 ]
Lin, Dahua [1 ,3 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[3] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
来源
关键词
Video representation; Visual concepts; Local contrast;
D O I
10.1007/978-3-031-19809-0_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel learning scheme for self-supervised video representation learning. Motivated by how humans understand videos, we propose to first learn general visual concepts then attend to discriminative local areas for video understanding. Specifically, we utilize static frame and frame difference to help decouple static and dynamic concepts, and respectively align the concept distributions in latent space. We add diversity and fidelity regularizations to guarantee that we learn a compact set of meaningful concepts. Then we employ a cross-attention mechanism to aggregate detailed local features of different concepts, and filter out redundant concepts with low activations to perform local concept contrast. Extensive experiments demonstrate that our method distills meaningful static and dynamic concepts to guide video understanding, and obtains state-of-the-art results on UCF-101, HMDB-51, and Diving-48.
引用
收藏
页码:145 / 164
页数:20
相关论文
共 50 条
  • [41] Cut-in maneuver detection with self-supervised contrastive video representation learning
    Nalcakan, Yagiz
    Bastanlar, Yalin
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (06) : 2915 - 2923
  • [42] Cross-View Temporal Contrastive Learning for Self-Supervised Video Representation
    Wang, Lulu
    Xu, Zengmin
    Zhang, Xuelian
    Meng, Ruxing
    Lu, Tao
    [J]. Computer Engineering and Applications, 60 (18): : 158 - 166
  • [43] Self-Supervised Multi-Label Transformation Prediction for Video Representation Learning
    Assefa, Maregu
    Jiang, Wei
    Yilma, Getinet
    Kumeda, Bulbula
    Ayalew, Melese
    Seid, Mohammed
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2022, 31 (09)
  • [44] Attentive spatial-temporal contrastive learning for self-supervised video representation
    Yang, Xingming
    Xiong, Sixuan
    Wu, Kewei
    Shan, Dongfeng
    Xie, Zhao
    [J]. IMAGE AND VISION COMPUTING, 2023, 137
  • [45] GOCA: Guided Online Cluster Assignment for Self-supervised Video Representation Learning
    Coskun, Huseyin
    Zareian, Alireza
    Moore, Joshua L.
    Tombari, Federico
    Wang, Chen
    [J]. COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 1 - 22
  • [46] Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
    Toering, Martine
    Gatopoulos, Ioannis
    Stol, Maarten
    Hu, Vincent Tao
    [J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 846 - 856
  • [47] Cut-in maneuver detection with self-supervised contrastive video representation learning
    Yagiz Nalcakan
    Yalin Bastanlar
    [J]. Signal, Image and Video Processing, 2023, 17 : 2915 - 2923
  • [48] Contrastive Spatio-Temporal Pretext Learning for Self-Supervised Video Representation
    Zhang, Yujia
    Po, Lai-Man
    Xu, Xuyuan
    Liu, Mengyang
    Wang, Yexin
    Ou, Weifeng
    Zhao, Yuzhi
    Yu, Wing-Yin
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3380 - 3389
  • [49] Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
    Wang, Rui
    Chen, Dongdong
    Wu, Zuxuan
    Chen, Yinpeng
    Dai, Xiyang
    Liu, Mengchen
    Yuan, Lu
    Jiang, Yu-Gang
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6312 - 6322
  • [50] Self-Distilled Self-supervised Representation Learning
    Jang, Jiho
    Kim, Seonhoon
    Yoo, Kiyoon
    Kong, Chaerin
    Kim, Jangho
    Kwak, Nojun
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2828 - 2838