Contrastive Masked Autoencoders for Self-Supervised Video Hashing

被引:0
|
作者
Wang, Yuting [1 ,3 ]
Wang, Jinpeng [1 ,3 ]
Chen, Bin [2 ]
Zeng, Ziyun [1 ,3 ]
Xia, Shu-Tao [1 ,3 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China
[2] Harbin Inst Technol, Shenzhen, Peoples R China
[3] Peng Cheng Lab, Res Ctr Artificial Intelligence, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-Supervised Video Hashing (SSVH) models learn to generate short binary representations for videos without ground-truth supervision, facilitating large-scale video retrieval efficiency and attracting increasing research attention. The success of SSVH lies in the understanding of video content and the ability to capture the semantic relation among unlabeled videos. Typically, state-of-the-art SSVH methods consider these two points in a two-stage training pipeline, where they firstly train an auxiliary network by instance-wise mask-and-predict tasks and secondly train a hashing model to preserve the pseudo-neighborhood structure transferred from the auxiliary network. This consecutive training strategy is inflexible and also unnecessary. In this paper, we propose a simple yet effective one-stage SSVH method called ConMH, which incorporates video semantic information and video similarity relationship understanding in a single stage. To capture video semantic information, we adopt an encoder-decoder structure to reconstruct the video from its temporal-masked frames. Particularly, we find that a higher masking ratio helps video understanding. Besides, we fully exploit the similarity relationship between videos by maximizing agreement between two augmented views of a video, which contributes to more discriminative and robust hash codes. Extensive experiments on three large-scale video datasets (i.e., FCVID, ActivityNet and YFCC) indicate that ConMH achieves state-of-the-art results. Code is available at https://github.com/ huangmozhi9527/ConMH.
引用
收藏
页码:2733 / 2741
页数:9
相关论文
共 50 条
  • [21] Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
    Wang, Rui
    Chen, Dongdong
    Wu, Zuxuan
    Chen, Yinpeng
    Dai, Xiyang
    Liu, Mengchen
    Yuan, Lu
    Jiang, Yu-Gang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6312 - 6322
  • [22] Robust image hashing for content identification through contrastive self-supervised learning
    Fonseca-Bustos, Jesús
    Ramírez-Gutiérrez, Kelsey Alejandra
    Feregrino-Uribe, Claudia
    Neural Networks, 2022, 156 : 81 - 94
  • [23] Self-supervised Graph Contrastive Learning for Video Question Answering
    Yao X.
    Gao J.-Y.
    Xu C.-S.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2083 - 2100
  • [24] Motion Sensitive Contrastive Learning for Self-supervised Video Representation
    Ni, Jingcheng
    Zhou, Nan
    Qin, Jie
    Wu, Qian
    Liu, Junqi
    Li, Boxun
    Huang, Di
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 457 - 474
  • [25] Robust image hashing for content identification through contrastive self-supervised learning
    Fonseca-Bustos, Jesus
    Alejandra Ramirez-Gutierrez, Kelsey
    Feregrino-Uribe, Claudia
    NEURAL NETWORKS, 2022, 156 : 81 - 94
  • [26] Masked Motion Encoding for Self-Supervised Video Representation Learning
    Sun, Xinyu
    Chen, Peihao
    Chen, Liangwei
    Li, Changhao
    Li, Thomas H.
    Tan, Mingkui
    Gan, Chuang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2235 - 2245
  • [27] PatchMixing Masked Autoencoders for 3D Point Cloud Self-Supervised Learning
    Lin, Chengxing
    Xu, Wenju
    Zhu, Jian
    Nie, Yongwei
    Cai, Ruichu
    Xu, Xuemiao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9882 - 9897
  • [28] rPPG-MAE: Self-Supervised Pretraining With Masked Autoencoders for Remote Physiological Measurements
    Liu, Xin
    Zhang, Yuting
    Yu, Zitong
    Lu, Hao
    Yue, Huanjing
    Yang, Jingyu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7278 - 7293
  • [29] Self-Supervised Video Hashing With Hierarchical Binary Auto-Encoder
    Song, Jingkuan
    Zhang, Hanwang
    Li, Xiangpeng
    Gao, Lianli
    Wang, Meng
    Hong, Richang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (07) : 3210 - 3221
  • [30] Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains
    Yang, Haiyang
    Tang, Shixiang
    Chen, Meilin
    Wang, Yizhou
    Zhu, Feng
    Bai, Lei
    Zhao, Rui
    Ouyang, Wanli
    COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 151 - 168