Self-supervised Video Representation Learning with Cascade Positive Retrieval

被引:1
|
作者
Wu, Cheng-En [1 ]
Lai, Farley [2 ]
Hu, Yu Hen [1 ]
Kadav, Asim [2 ]
机构
[1] Univ Wisconsin Madison, Dept Elect & Comp Engn, Madison, WI 53706 USA
[2] NEC Labs Amer Inc, San Jose, CA USA
关键词
D O I
10.1109/CVPRW56347.2022.00452
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Self-supervised video representation learning has been shown to effectively improve downstream tasks such as video retrieval and action recognition. In this paper, we present the Cascade Positive Retrieval (CPR) that successively mines positive examples w.r.t. the query for contrastive learning in a cascade of stages. Specifically, CPR exploits multiple views of a query example in different modalities, where an alternative view may help find another positive example dissimilar in the query view. We explore the effects of possible CPR configurations in ablations including the number of mining stages, the top similar example selection ratio in each stage, and progressive training with an incremental number of the final Top-k selection. The overall mining quality is measured to reflect the recall across training set classes. CPR reaches a median class mining recall of 83.3%, outperforming previous work by 5.5%. Implementation-wise, CPR is complementary to pretext tasks and can be easily applied to previous work. In the evaluation of pretraining on UCF101, CPR consistently improves existing work and even achieves state-of-the-art R@1 of 56.7% and 24.4% in video retrieval as well as 83.8% and 54.8% in action recognition on UCF101 and HMDB51.
引用
收藏
页码:4079 / 4088
页数:10
相关论文
共 50 条
  • [1] SELF-SUPERVISED REPRESENTATION LEARNING FOR ULTRASOUND VIDEO
    Jiao, Jianbo
    Droste, Richard
    Drukker, Lior
    Papageorghiou, Aris T.
    Noble, J. Alison
    [J]. 2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 1847 - 1850
  • [2] Self-Supervised Video Representation Learning by Video Incoherence Detection
    Cao, Haozhi
    Xu, Yuecong
    Mao, Kezhi
    Xie, Lihua
    Yin, Jianxiong
    See, Simon
    Xu, Qianwen
    Yang, Jianfei
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (06) : 3810 - 3822
  • [3] Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval
    Chen, Qingzhong
    Cai, Shilun
    Cai, Crystal
    Yu, Zefang
    Qian, Dahong
    Xiang, Suncheng
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1056 - 1061
  • [4] Video Face Clustering with Self-Supervised Representation Learning
    Sharma, Vivek
    Tapaswi, Makarand
    Saquib Sarfraz, M.
    Stiefelhagen, Rainer
    [J]. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020, 2 (02): : 145 - 157
  • [5] Self-Supervised Representation Learning for Video Quality Assessment
    Jiang, Shaojie
    Sang, Qingbing
    Hu, Zongyao
    Liu, Lixiong
    [J]. IEEE TRANSACTIONS ON BROADCASTING, 2023, 69 (01) : 118 - 129
  • [6] Video Motion Perception for Self-supervised Representation Learning
    Li, Wei
    Luo, Dezhao
    Fang, Bo
    Li, Xiaoni
    Zhou, Yu
    Wang, Weiping
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 508 - 520
  • [7] Static and Dynamic Concepts for Self-supervised Video Representation Learning
    Qian, Rui
    Ding, Shuangrui
    Liu, Xian
    Lin, Dahua
    [J]. COMPUTER VISION, ECCV 2022, PT XXVI, 2022, 13686 : 145 - 164
  • [8] Motion Sensitive Contrastive Learning for Self-supervised Video Representation
    Ni, Jingcheng
    Zhou, Nan
    Qin, Jie
    Wu, Qian
    Liu, Junqi
    Li, Boxun
    Huang, Di
    [J]. COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 457 - 474
  • [9] Self-Supervised Spatiotemporal Representation Learning by Exploiting Video Continuity
    Liang, Hanwen
    Quader, Niamul
    Chi, Zhixiang
    Chen, Lizhe
    Dai, Peng
    Lu, Juwei
    Wang, Yang
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1564 - 1573
  • [10] Self-supervised video representation learning by maximizing mutual information
    Xue, Fei
    Ji, Hongbing
    Zhang, Wenbo
    Cao, Yi
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 88