CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing

被引:1
|
作者
Wei, Rukai [1 ]
Liu, Yu [1 ]
Song, Jingkuan [2 ]
Cui, Heng [1 ]
Xie, Yanzhao [1 ]
Zhou, Ke [1 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
[2] Univ Elect Sci & Technol China, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-supervised video hashing; Spatio-temporal contrastive learning; Frame order verification; Scene change regularization;
D O I
10.1145/3581783.3613440
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Compressing videos into binary codes can improve retrieval speed and reduce storage overhead. However, learning accurate hash codes for video retrieval can be challenging due to high local redundancy and complex global dependencies between video frames, especially in the absence of labels. Existing self-supervised video hashing methods have been effective in designing expressive temporal encoders, but have not fully utilized the temporal dynamics and spatial appearance of videos due to less challenging and unreliable learning tasks. To address these challenges, we begin by utilizing the contrastive learning task to capture global spatio-temporal information of videos for hashing. With the aid of our designed augmentation strategies, which focus on spatial and temporal variations to create positive pairs, the learning framework can generate hash codes that are invariant to motion, scale, and viewpoint. Furthermore, we incorporate two collaborative learning tasks, i.e., frame order verification and scene change regularization, to capture local spatio-temporal details within video frames, thereby enhancing the perception of temporal structure and the modeling of spatio-temporal relationships. Our proposed Contrastive Hashing with Global-Local Spatio-temporal Information (CHAIN) outperforms state-of-the-art self-supervised video hashing methods on four video benchmark datasets. Our codes will be released.
引用
收藏
页码:1677 / 1688
页数:12
相关论文
共 46 条
  • [1] Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning
    Luo, Dezhao
    Liu, Chang
    Zhou, Yu
    Yang, Dongbao
    Ma, Can
    Ye, Qixiang
    Wang, Weiping
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11701 - 11708
  • [2] Self-Supervised Video Representation Learning by Uncovering Spatio-Temporal Statistics
    Wang, Jiangliu
    Jiao, Jianbo
    Bao, Linchao
    He, Shengfeng
    Liu, Wei
    Liu, Yun-hui
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) : 3791 - 3806
  • [3] Global-local spatio-temporal graph convolutional networks for video summarization
    Wu, Guangli
    Song, Shanshan
    Zhang, Jing
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2024, 118
  • [4] Contrastive Spatio-Temporal Pretext Learning for Self-Supervised Video Representation
    Zhang, Yujia
    Po, Lai-Man
    Xu, Xuyuan
    Liu, Mengyang
    Wang, Yexin
    Ou, Weifeng
    Zhao, Yuzhi
    Yu, Wing-Yin
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3380 - 3389
  • [5] Spatio-Temporal Catcher: a Self-Supervised Transformer for Deepfake Video Detection
    Li, Maosen
    Li, Xurong
    Yu, Kun
    Deng, Cheng
    Huang, Heng
    Mao, Feng
    Xue, Hui
    Li, Minghao
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8707 - 8718
  • [6] Self-Supervised Temporal Sensitive Hashing for Video Retrieval
    Li, Qihua
    Tian, Xing
    Ng, Wing W. Y.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9021 - 9035
  • [7] Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning
    Yao, Yuan
    Liu, Chang
    Luo, Dezhao
    Zhou, Yu
    Ye, Qixiang
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6547 - 6556
  • [8] Spatio-Temporal Self-Supervised Learning for Traffic Flow Prediction
    Ji, Jiahao
    Wang, Jingyuan
    Huang, Chao
    Wu, Junjie
    Xu, Boren
    Wu, Zhenhe
    Zhang, Junbo
    Zheng, Yu
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 4356 - 4364
  • [9] Exploring Hierarchical Information in Hyperbolic Space for Self-Supervised Image Hashing
    Wei, Rukai
    Liu, Yu
    Song, Jingkuan
    Xie, Yanzhao
    Zhou, Ke
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1768 - 1781
  • [10] Self-Supervised Global Spatio-Temporal Interaction Pre-Training for Group Activity Recognition
    Du, Zexing
    Wang, Xue
    Wang, Qing
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5076 - 5088