Learning Unsupervised Visual Grounding Through Semantic Self-Supervision

被引:0
|
作者
Javed, Syed Ashar [1 ]
Saxena, Shreyas
Gandhi, Vineet [2 ]
机构
[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
[2] IIIT Hyderabad, CVIT, Kohli Ctr Intelligent Syst KCIS, Hyderabad, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Localizing natural language phrases in images is a challenging problem that requires joint understanding of both the textual and visual modalities. In the unsupervised setting, lack of supervisory signals exacerbate this difficulty. In this paper, we propose a novel framework for unsupervised visual grounding which uses concept learning as a proxy task to obtain self-supervision. The intuition behind this idea is to encourage the model to localize to regions which can explain some semantic property in the data, in our case, the property being the presence of a concept in a set of images We present thorough quantitative and qualitative experiments to demonstrate the efficacy of our approach and show a 5.6% improvement over the current state of the art on Visual Genome dataset, a 5.8% improvement on the ReferItGame dataset and comparable to state-of-art performance on the Flickr30k dataset.
引用
收藏
页码:796 / 802
页数:7
相关论文
共 50 条
  • [21] Fine-Grained Self-Supervision for Generalizable Semantic Segmentation
    Zhang, Yuhang
    Tian, Shishun
    Liao, Muxin
    Zhang, Zhengyu
    Zou, Wenbin
    Xu, Chen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 371 - 383
  • [22] THE FEASIBILITY OF SELF-SUPERVISION
    Hudelson, Earl
    JOURNAL OF EDUCATIONAL RESEARCH, 1952, 45 (05): : 335 - 347
  • [23] End-to-end novel visual categories learning via auxiliary self-supervision
    Qing, Yuanyuan
    Zeng, Yijie
    Cao, Qi
    Huang, Guang-Bin
    NEURAL NETWORKS, 2021, 139 : 24 - 32
  • [24] Unsupervised 3D Pose Estimation with Geometric Self-Supervision
    Chen, Ching-Hang
    Tyagi, Ambrish
    Agrawal, Amit
    Drover, Dylan
    Rohith, M., V
    Stojanov, Stefan
    Rehg, James M.
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5707 - 5717
  • [25] Self-distillation and self-supervision for partial label learning
    Yu, Xiaotong
    Sun, Shiding
    Tian, Yingjie
    PATTERN RECOGNITION, 2024, 146
  • [26] Self-supervision & meta-learning for one-shot unsupervised cross-domain detection
    Borlino, Francesco Cappio
    Polizzotto, Salvatore
    Caputo, Barbara
    Tommasi, Tatiana
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 223
  • [27] Improving Spatiotemporal Self-supervision by Deep Reinforcement Learning
    Buechler, Uta
    Brattoli, Biagio
    Ommer, Bjoern
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 797 - 814
  • [28] CoLES: Contrastive Learning for Event Sequences with Self-Supervision
    Babaev, Dmitrii
    Ovsov, Nikita
    Kireev, Ivan
    Ivanova, Maria
    Gusev, Gleb
    Nazarov, Ivan
    Tuzhilin, Alexander
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1190 - 1199
  • [29] Co-learning: Learning from Noisy Labels with Self-supervision
    Tan, Cheng
    Xia, Jun
    Wu, Lirong
    Li, Stan Z.
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1405 - 1413
  • [30] Soft prompt-tuning for unsupervised domain adaptation via self-supervision
    Zhu, Yi
    Wang, Shuqin
    Li, Yun
    Yuan, Yunhao
    Qiang, Jipeng
    Neurocomputing, 2025, 617