Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision

被引:8
|
作者
Yuan, Liangzhe [1 ]
Qian, Rui [1 ,2 ,3 ]
Cui, Yin [1 ]
Gong, Boqing [1 ]
Schroff, Florian [1 ]
Yang, Ming-Hsuan [1 ]
Adam, Hartwig [1 ]
Liu, Ting [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] Cornell Univ, Ithaca, NY 14853 USA
[3] Google, Mountain View, CA 94043 USA
关键词
D O I
10.1109/CVPR52688.2022.01359
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern self:supervised learning algorithms typically enforce persistency of instance representations across views. While being very effective on learning holistic image and video representations, such an objective becomes suboptimal for learning spatio-temporally fine-grained features in videos, where scenes and instances evolve through space and time. In this paper; we present Contextualized Spatio-Temporal Contrastive Learning (ConST-CL) to effectively learn spatio-temporally fine-grained video representations via self-supervision. We first design a region-based pretext task which requires the model to transform instance representations from one view to another, guided by context features. Further; we introduce a simple network design that successfully reconciles the simultaneous learning process of both holistic and local representations. We evaluate our learned representations on a variety of downstream tasks and show that ConST-CL achieves competitive results on 6 datasets, including Kinetics, UCF, HMDB, AVA-Kinetics, AVA and OTB. Our code and models will be available at https : //github.com/tensorflow/models/tree/master/official/projects/const_c1.
引用
收藏
页码:13957 / 13966
页数:10
相关论文
共 50 条
  • [1] Forecasting Fine-Grained Urban Flows Via Spatio-Temporal Contrastive Self-Supervision
    Qu, Hao
    Gong, Yongshun
    Chen, Meng
    Zhang, Junbo
    Zheng, Yu
    Yin, Yilong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (08) : 8008 - 8023
  • [2] Spatio-Temporal Self-supervision for Few-Shot Action Recognition
    Yu, Wanchuan
    Guo, Hanyu
    Yan, Yan
    Li, Jie
    Wang, Hanzi
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 84 - 96
  • [3] Link Prediction with Contextualized Self-Supervision
    Zhang, Daokun
    Yin, Jie
    Yu, Philip S. S.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) : 7138 - 7151
  • [4] CoLES: Contrastive Learning for Event Sequences with Self-Supervision
    Babaev, Dmitrii
    Ovsov, Nikita
    Kireev, Ivan
    Ivanova, Maria
    Gusev, Gleb
    Nazarov, Ivan
    Tuzhilin, Alexander
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1190 - 1199
  • [5] Spatio-Temporal Meta Contrastive Learning
    Tang, Jiabin
    Xia, Lianghao
    Hu, Jie
    Huang, Chao
    [J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 2412 - 2421
  • [6] Video-based spatio-temporal scene graph generation with efficient self-supervision tasks
    Lianggangxu Chen
    Yiqing Cai
    Changhong Lu
    Changbo Wang
    Gaoqi He
    [J]. Multimedia Tools and Applications, 2023, 82 : 38947 - 38966
  • [7] Video-based spatio-temporal scene graph generation with efficient self-supervision tasks
    Chen, Lianggangxu
    Cai, Yiqing
    Lu, Changhong
    Wang, Changbo
    He, Gaoqi
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (25) : 38947 - 38966
  • [8] Dual Contrastive Learning for Spatio-temporal Representation
    Ding, Shuangrui
    Qian, Rui
    Xiong, Hongkai
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5649 - 5658
  • [9] Contrastive Spatio-Temporal Pretext Learning for Self-Supervised Video Representation
    Zhang, Yujia
    Po, Lai-Man
    Xu, Xuyuan
    Liu, Mengyang
    Wang, Yexin
    Ou, Weifeng
    Zhao, Yuzhi
    Yu, Wing-Yin
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3380 - 3389
  • [10] CONTRASTIVE SELF-SUPERVISED LEARNING FOR SPATIO-TEMPORAL ANALYSIS OF LUNG ULTRASOUND VIDEOS
    Chen, Li
    Rubin, Jonathan
    Ouyang, Jiahong
    Balaraju, Naveen
    Patil, Shubham
    Mehanian, Courosh
    Kulhare, Sourabh
    Millin, Rachel
    Gregory, Kenton W.
    Gregory, Cynthia R.
    Zhu, Meihua
    Kessler, David O.
    Malia, Laurie
    Dessie, Almaz
    Rabiner, Joni
    Coneybeare, Di
    Shopsin, Bo
    Hersh, Andrew
    Madar, Cristian
    Shupp, Jeffrey
    Johnson, Laura S.
    Avila, Jacob
    Dwyer, Kristin
    Weimersheimer, Peter
    Raju, Balasundar
    Kruecker, Jochen
    Chen, Alvin
    [J]. 2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,