Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning

被引:1
|
作者
Das, Srijan [1 ]
Ryoo, Michael [2 ]
机构
[1] UNC Charlotte, Charlotte, NC 28223 USA
[2] SUNY Stony Brook, Stony Brook, NY USA
关键词
D O I
10.23919/MVA57639.2023.10216260
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we address the challenge of obtaining large-scale unlabelled video datasets for contrastive representation learning in real-world applications. We present a novel video augmentation technique for self-supervised learning, called Cross-Modal Manifold Cutmix (CMMC), which generates augmented samples by combining different modalities in videos. By embedding a video tesseract into another across two modalities in the feature space, our method enhances the quality of learned video representations. We perform extensive experiments on two small-scale video datasets, UCF101 and HMDB51, for action recognition and video retrieval tasks. Our approach is also shown to be effective on the NTU dataset with limited domain knowledge. Our CMMC achieves comparable performance to other self-supervised methods while using less training data for both downstream tasks.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Static and Dynamic Concepts for Self-supervised Video Representation Learning
    Qian, Rui
    Ding, Shuangrui
    Liu, Xian
    Lin, Dahua
    [J]. COMPUTER VISION, ECCV 2022, PT XXVI, 2022, 13686 : 145 - 164
  • [42] Self-supervised video representation learning by maximizing mutual information
    Xue, Fei
    Ji, Hongbing
    Zhang, Wenbo
    Cao, Yi
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 88
  • [43] Self-supervised Video Representation Learning with Cascade Positive Retrieval
    Wu, Cheng-En
    Lai, Farley
    Hu, Yu Hen
    Kadav, Asim
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4079 - 4088
  • [44] Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw
    Huo, Yuqi
    Ding, Mingyu
    Lu, Haoyu
    Huang, Ziyuan
    Tang, Mingqian
    Lu, Zhiwu
    Xiang, Tao
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 751 - 757
  • [45] Temporally Coherent Embeddings for Self-Supervised Video Representation Learning
    Knights, Joshua
    Harwood, Ben
    Ward, Daniel
    Vanderkop, Anthony
    Mackenzie-Ross, Olivia
    Moghadam, Peyman
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8914 - 8921
  • [46] Learning disentangled representation for self-supervised video object segmentation
    Hou, Wenjie
    Qin, Zheyun
    Xi, Xiaoming
    Lu, Xiankai
    Yin, Yilong
    [J]. Neurocomputing, 2022, 481 : 270 - 280
  • [47] Learning disentangled representation for self-supervised video object segmentation
    Hou, Wenjie
    Qin, Zheyun
    Xi, Xiaoming
    Lu, Xiankai
    Yin, Yilong
    [J]. NEUROCOMPUTING, 2022, 481 : 270 - 280
  • [48] Self-supervised Video Representation Learning by Context and Motion Decoupling
    Huang, Lianghua
    Liu, Yu
    Wang, Bin
    Pan, Pan
    Xu, Yinghui
    Jin, Rong
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13881 - 13890
  • [49] Self-supervised Co-training for Video Representation Learning
    Han, Tengda
    Xie, Weidi
    Zisserman, Andrew
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [50] Masked Motion Encoding for Self-Supervised Video Representation Learning
    Sun, Xinyu
    Chen, Peihao
    Chen, Liangwei
    Li, Changhao
    Li, Thomas H.
    Tan, Mingkui
    Gan, Chuang
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2235 - 2245