Discriminative Spatiotemporal Alignment for Self-Supervised Video Correspondence Learning

被引:0
|
作者
Wei, Qiaoqiao [1 ]
Zhang, Hui [1 ]
Yong, Jun-Hai [1 ]
机构
[1] Tsinghua Univ, BNRist, Sch Software, Beijing, Peoples R China
关键词
self-supervised learning; video correspondence; spatiotemporal alignment;
D O I
10.1109/ICME55011.2023.00316
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on self-supervised video correspondence learning, which learns effective representations from raw videos without manual annotations and exploits the learned representations for video visual tracking tasks. Previous methods extract temporal correspondence between two frames in fixed geometric structures, which easily leads to mismatches of pixels and overlooks the intra-frame semantic correspondence. To address these issues, we propose a Discriminative Spatio-temporal Alignment (DSA) framework to improve the tracking accuracy in the inference stage. DSA first discriminates representations of different instances for each reference frame through an Instance-Guided Spatial Alignment (IGSA) module. Then, it employs a Focused Temporal Alignment (FTA) module, which samples discriminative pixels from reference frames and propagates the labels of the sampled reference pixels to a target pixel. Experimental results show that DSA possesses flexibility and generalizability and has boosted previous approaches on three tracking tasks, including video object segmentation, human part segmentation, and pose keypoint tracking.
引用
收藏
页码:1841 / 1846
页数:6
相关论文
共 50 条
  • [1] Spatial-then-Temporal Self-Supervised Learning for Video Correspondence
    Li, Rui
    Liu, Dong
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2279 - 2288
  • [2] Self-Supervised Spatiotemporal Representation Learning by Exploiting Video Continuity
    Liang, Hanwen
    Quader, Niamul
    Chi, Zhixiang
    Chen, Lizhe
    Dai, Peng
    Lu, Juwei
    Wang, Yang
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1564 - 1573
  • [3] Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw
    Huo, Yuqi
    Ding, Mingyu
    Lu, Haoyu
    Huang, Ziyuan
    Tang, Mingqian
    Lu, Zhiwu
    Xiang, Tao
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 751 - 757
  • [4] Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation
    Li, Liulei
    Wang, Wenguan
    Zhou, Tianfei
    Li, Jianwu
    Yang, Yi
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18706 - 18716
  • [5] Self-supervised Spatiotemporal Learning via Video Clip Order Prediction
    Xu, Dejing
    Xiao, Jun
    Zhao, Zhou
    Shao, Jian
    Xie, Di
    Zhuang, Yueting
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10326 - 10335
  • [6] Self-Supervised Correspondence in Visuomotor Policy Learning
    Florence, Peter
    Manuelli, Lucas
    Tedrake, Russ
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) : 492 - 499
  • [7] Contrastive Transformation for Self-supervised Correspondence Learning
    Wang, Ning
    Zhou, Wengang
    Li, Hougiang
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10174 - 10182
  • [8] Self-supervised Discriminative Representation Learning by Fuzzy Autoencoder
    Yang, Wenlu
    Wang, Hongjun
    Zhang, Yinghui
    Liu, Zehao
    Li, Tianrui
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (01)
  • [9] Self-Supervised Learning for Alignment of Objects and Sound
    Liu, Xinzhu
    Liu, Xiaoyu
    Guo, Di
    Liu, Huaping
    Sun, Fuchun
    Min, Haibo
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 1588 - 1594
  • [10] MaMiCo: Macro-to-Micro Semantic Correspondence for Self-supervised Video Representation Learning
    Fang, Bo
    Wu, Wenhao
    Liu, Chang
    Zhou, Yu
    He, Dongliang
    Wang, Weiping
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1348 - 1357