CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing

被引:1
|
作者
Wei, Rukai [1 ]
Liu, Yu [1 ]
Song, Jingkuan [2 ]
Cui, Heng [1 ]
Xie, Yanzhao [1 ]
Zhou, Ke [1 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
[2] Univ Elect Sci & Technol China, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-supervised video hashing; Spatio-temporal contrastive learning; Frame order verification; Scene change regularization;
D O I
10.1145/3581783.3613440
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Compressing videos into binary codes can improve retrieval speed and reduce storage overhead. However, learning accurate hash codes for video retrieval can be challenging due to high local redundancy and complex global dependencies between video frames, especially in the absence of labels. Existing self-supervised video hashing methods have been effective in designing expressive temporal encoders, but have not fully utilized the temporal dynamics and spatial appearance of videos due to less challenging and unreliable learning tasks. To address these challenges, we begin by utilizing the contrastive learning task to capture global spatio-temporal information of videos for hashing. With the aid of our designed augmentation strategies, which focus on spatial and temporal variations to create positive pairs, the learning framework can generate hash codes that are invariant to motion, scale, and viewpoint. Furthermore, we incorporate two collaborative learning tasks, i.e., frame order verification and scene change regularization, to capture local spatio-temporal details within video frames, thereby enhancing the perception of temporal structure and the modeling of spatio-temporal relationships. Our proposed Contrastive Hashing with Global-Local Spatio-temporal Information (CHAIN) outperforms state-of-the-art self-supervised video hashing methods on four video benchmark datasets. Our codes will be released.
引用
收藏
页码:1677 / 1688
页数:12
相关论文
共 46 条
  • [31] Learning Spatio-Temporal Pulse Representation With Global-Local Interaction and Supervision for Remote Prediction of Heart Rate
    Zhao, Changchen
    Zhou, Menghao
    Zhao, Zheng
    Huang, Bin
    Rao, Bing
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (02) : 609 - 620
  • [32] Anomaly detection for key performance indicators by fusing self-supervised spatio-temporal graph attention networks
    Chen, Ningjiang
    Tu, Huan
    Zeng, Haoyang
    Ou, Yangjie
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 300
  • [33] A spatio-temporal integrated model based on local and global features for video expression recognition
    Hu, Min
    Ge, Peng
    Wang, Xiaohua
    Lin, Hui
    Ren, Fuji
    [J]. VISUAL COMPUTER, 2022, 38 (08): : 2617 - 2634
  • [34] A spatio-temporal integrated model based on local and global features for video expression recognition
    Min Hu
    Peng Ge
    Xiaohua Wang
    Hui Lin
    Fuji Ren
    [J]. The Visual Computer, 2022, 38 : 2617 - 2634
  • [35] Exploring complementary information of self-supervised pretext tasks for unsupervised video pre-training
    Zhou, Wei
    Hou, Yi
    Ouyang, Kewei
    Zhou, Shilin
    [J]. IET COMPUTER VISION, 2022, 16 (03) : 255 - 265
  • [36] DCPoint: Global-Local Dual Contrast for Self-Supervised Representation Learning of 3-D Point Clouds
    Shi, Lu
    Zhang, Guoqing
    Cao, Qi
    Zhang, Linna
    Cen, Yigang
    Cen, Yi
    [J]. IEEE SENSORS JOURNAL, 2024, 24 (14) : 23224 - 23238
  • [37] Self-Supervised Global-Local Structure Modeling for Point Cloud Domain Adaptation with Reliable Voted Pseudo Labels
    Fan, Hehe
    Chang, Xiaojun
    Zhang, Wanyue
    Cheng, Yi
    Sun, Ying
    Kankanhalli, Mohan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6367 - 6376
  • [38] Spatio-Temporal Super-Resolution from Compressed Video Employing Global and Local Motion
    Chen, Yue-Meng
    Bajic, Ivan V.
    [J]. 2011 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011, : 907 - 912
  • [39] Multi-view Guidance for Self-supervised Monocular Depth Estimation on Laparoscopic Images via Spatio-Temporal Correspondence
    Li, Wenda
    Hayashi, Yuichiro
    Oda, Masahiro
    Kitasaka, Takayuki
    Misawa, Kazunari
    Mori, Kensaku
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IX, 2023, 14228 : 429 - 439
  • [40] Improved background modeling of video sequences using spatio-temporal extension of fuzzy local binary pattern
    Sefidmazgi, Akram Norouzi
    Nahvi, Manoochehr
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (12) : 17287 - 17316