Deconfounded Visual Grounding

被引:0
|
作者
Huang, Jianqiang [1 ,2 ]
Qin, Yu [2 ]
Qi, Jiaxin [1 ]
Sun, Qianru [3 ]
Zhang, Hanwang [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Alibaba Grp, Damo Acad, Hangzhou, Peoples R China
[3] Singapore Management Univ, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We focus on the confounding bias between language and location in the visual grounding pipeline, where we find that the bias is the major visual reasoning bottleneck. For example, the grounding process is usually a trivial language-location association without visual reasoning, e.g., grounding any language query containing sheep to the nearly central regions, due to that most queries about sheep have ground-truth locations at the image center. First, we frame the visual grounding pipeline into a causal graph, which shows the causalities among image, query, target location and underlying confounder. Through the causal graph, we know how to break the grounding bottleneck: deconfounded visual grounding. Second, to tackle the challenge that the confounder is unobserved in general, we propose a confounder-agnostic approach called: Referring Expression Deconfounder (RED), to remove the confounding bias. Third, we implement RED as a simple language attention, which can be applied in any grounding method. On popular benchmarks, RED improves various state-of-the-art grounding methods by a significant margin. Code is available at: https://github.com/JianqiangH/Deconfounded_VG.
引用
收藏
页码:998 / 1006
页数:9
相关论文
共 50 条
  • [1] Deconfounded Multimodal Learning for Spatio-temporal Video Grounding
    Wang, Jiawei
    Ma, Zhanchang
    Cao, Da
    Le, Yuquan
    Xiao, Junbin
    Chua, Tat-Seng
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7521 - 7529
  • [2] Deconfounded Visual Question Generation with Causal Inference
    Chen, Jiali
    Guo, Zhenjun
    Xie, Jiayuan
    Cai, Yi
    Li, Qing
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5132 - 5142
  • [3] VISUAL GROUNDING
    CUMBOW, RC
    [J]. AMERICAN FILM, 1978, 3 (10): : 16 - 16
  • [4] Grounding Visual Explanations
    Hendricks, Lisa Anne
    Hu, Ronghang
    Darrell, Trevor
    Akata, Zeynep
    [J]. COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 269 - 286
  • [5] Flexible Visual Grounding
    Kim, Yongmin
    Chu, Chenhui
    Kurohashi, Sadao
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 285 - 299
  • [6] Gaze Assisted Visual Grounding
    Johari, Kritika
    Tong, Christopher Tay Zi
    Subbaraju, Vigneshwaran
    Kim, Jung-Jae
    Tan, U-Xuan
    [J]. SOCIAL ROBOTICS, ICSR 2021, 2021, 13086 : 191 - 202
  • [7] Visual-Semantic Graph Matching for Visual Grounding
    Jing, Chenchen
    Wu, Yuwei
    Pei, Mingtao
    Hu, Yao
    Jia, Yunde
    Wu, Qi
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4041 - 4050
  • [8] Cross-Lingual Visual Grounding
    Dong, Wenjian
    Otani, Mayu
    Garcia, Noa
    Nakashima, Yuta
    Chu, Chenhui
    [J]. IEEE ACCESS, 2021, 9 : 349 - 358
  • [9] Grounding Language in Visual and Conversational Contexts
    Fernandez, Raquel
    [J]. WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 366 - 366
  • [10] Visual Grounding via Accumulated Attention
    Deng, Chaorui
    Wu, Qi
    Wu, Qingyao
    Hu, Fuyuan
    Lyu, Fan
    Tan, Mingkui
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7746 - 7755