Deconfounded Visual Grounding

被引:0
|
作者
Huang, Jianqiang [1 ,2 ]
Qin, Yu [2 ]
Qi, Jiaxin [1 ]
Sun, Qianru [3 ]
Zhang, Hanwang [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Alibaba Grp, Damo Acad, Hangzhou, Peoples R China
[3] Singapore Management Univ, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We focus on the confounding bias between language and location in the visual grounding pipeline, where we find that the bias is the major visual reasoning bottleneck. For example, the grounding process is usually a trivial language-location association without visual reasoning, e.g., grounding any language query containing sheep to the nearly central regions, due to that most queries about sheep have ground-truth locations at the image center. First, we frame the visual grounding pipeline into a causal graph, which shows the causalities among image, query, target location and underlying confounder. Through the causal graph, we know how to break the grounding bottleneck: deconfounded visual grounding. Second, to tackle the challenge that the confounder is unobserved in general, we propose a confounder-agnostic approach called: Referring Expression Deconfounder (RED), to remove the confounding bias. Third, we implement RED as a simple language attention, which can be applied in any grounding method. On popular benchmarks, RED improves various state-of-the-art grounding methods by a significant margin. Code is available at: https://github.com/JianqiangH/Deconfounded_VG.
引用
收藏
页码:998 / 1006
页数:9
相关论文
共 50 条
  • [21] Countering Language Drift via Visual Grounding
    Lee, Jason
    Cho, Kyunghyun
    Kiela, Douwe
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 4385 - 4395
  • [22] Deconfounded Recommendation for Alleviating Bias Amplification
    Wang, Wenjie
    Feng, Fuli
    He, Xiangnan
    Wang, Xiang
    Chua, Tat-Seng
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 1717 - 1725
  • [23] Grounding Visual Representations with Texts for Domain Generalization
    Min, Seonwoo
    Park, Nokyung
    Kim, Siwon
    Park, Seunghyun
    Kim, Jinkyu
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 37 - 53
  • [24] Grounding visual sociology research in shooting scripts
    Suchar C.S.
    [J]. Qualitative Sociology, 1997, 20 (1) : 33 - 55
  • [25] Visual Grounding With Joint Multimodal Representation and Interaction
    Zhu, Hong
    Lu, Qingyang
    Xue, Lei
    Xue, Mogen
    Yuan, Guanglin
    Zhong, Bineng
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72 : 1 - 11
  • [26] Parallel Vertex Diffusion for Unified Visual Grounding
    Cheng, Zesen
    Li, Kehan
    Jin, Peng
    Li, Siheng
    Ji, Xiangyang
    Yuan, Li
    Liu, Chang
    Chen, Jie
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1326 - 1334
  • [27] A Better Loss for Visual-Textual Grounding
    Rigoni, Davide
    Serafini, Luciano
    Sperduti, Alessandro
    [J]. 37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2022, : 49 - 57
  • [28] Visual Grounding Annotation of Recipe Flow Graph
    Nishimura, Taichi
    Tomori, Suzushi
    Hashimoto, Hayato
    Hashimoto, Atsushi
    Yamakata, Yoko
    Harashima, Jun
    Ushiku, Yoshitaka
    Mori, Shinsuke
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4275 - 4284
  • [29] Learning to Follow Verbal Instructions with Visual Grounding
    Unal, Emre
    Can, Ozan Arkan
    Yemez, Yucel
    [J]. 2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [30] INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
    Zhang, Hanbo
    Lu, Yunfan
    Yu, Cunjun
    Hsu, David
    Lan, Xuguang
    Zheng, Nanning
    [J]. ROBOTICS: SCIENCE AND SYSTEM XVII, 2021,