Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual Question Answering

被引:0
|
作者
Reichman, Benjamin [1 ]
Heck, Larry [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
关键词
D O I
10.1109/ICCVW60793.2023.00304
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many language processing tasks including most notably Large Language Modeling (LLM), retrieval augmentation improves the performance of the models by adding information during inference that may not be present in the model's weights. This technique has been shown to be particularly useful in multimodal settings. For some tasks, like Outside Knowledge Visual Question Answering (OK-VQA), retrieval augmentation is required given the open nature of the knowledge. In many prior works for the OK-VQA task, the retriever is either a unimodal language retriever or an untrained cross-modal retriever. In this work, we present a weakly supervised training approach for cross-modal retrievers. Our method takes inspiration from the natural language modeling task of information retrieval and extends those methods to cross-modal retrieval. Since the OKVQA task does not typically have consistent ground truth retrieval labels, we evaluate our model using lexical overlap between the ground truth and the retrieved passage. Our approach showed an average recall improvement of 28% across a large range of retrieval sizes compared to a baseline backbone network.
引用
收藏
页码:2829 / 2834
页数:6
相关论文
共 50 条
  • [1] Cross-Modal Retrieval for Knowledge-Based Visual Question Answering
    Lerner, Paul
    Ferret, Olivier
    Guinaudeau, Camille
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 421 - 438
  • [2] Passage Retrieval for Outside-Knowledge Visual Question Answering
    Qu, Chen
    Zamani, Hamed
    Yang, Liu
    Croft, W. Bruce
    Learned-Miller, Erik
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1753 - 1757
  • [3] Cross-modal knowledge reasoning for knowledge-based visual question answering
    Yu, Jing
    Zhu, Zihao
    Wang, Yujing
    Zhang, Weifeng
    Hu, Yue
    Tan, Jianlong
    [J]. PATTERN RECOGNITION, 2020, 108
  • [4] Reasoning on the Relation: Enhancing Visual Representation for Visual Question Answering and Cross-Modal Retrieval
    Yu, Jing
    Zhang, Weifeng
    Lu, Yuhang
    Qin, Zengchang
    Hu, Yue
    Tan, Jianlong
    Wu, Qi
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3196 - 3209
  • [5] Cross-Modal Visual Question Answering for Remote Sensing Data
    Felix, Rafael
    Repasky, Boris
    Hodge, Samuel
    Zolfaghari, Reza
    Abbasnejad, Ehsan
    Sherrah, Jamie
    [J]. 2021 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2021), 2021, : 57 - 65
  • [6] Cross-modal Relational Reasoning Network for Visual Question Answering
    Chen, Hongyu
    Liu, Ruifang
    Peng, Bo
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3939 - 3948
  • [7] Visual question answering with attention transfer and a cross-modal gating mechanism
    Li, Wei
    Sun, Jianhui
    Liu, Ge
    Zhao, Linglan
    Fang, Xiangzhong
    [J]. PATTERN RECOGNITION LETTERS, 2020, 133 : 334 - 340
  • [8] Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering
    Salemi, Alireza
    Rafiee, Mahta
    Zamani, Hamed
    [J]. PROCEEDINGS OF THE 2023 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2023, 2023, : 169 - 176
  • [9] Medical visual question answering with symmetric interaction attention and cross-modal gating
    Chen, Zhi
    Zou, Beiji
    Dai, Yulan
    Zhu, Chengzhang
    Kong, Guilan
    Zhang, Wensheng
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 85
  • [10] Jointly Learning Attentions with Semantic Cross-Modal Correlation for Visual Question Answering
    Cao, Liangfu
    Gao, Lianli
    Song, Jingkuan
    Xu, Xing
    Shen, Heng Tao
    [J]. DATABASES THEORY AND APPLICATIONS, ADC 2017, 2017, 10538 : 248 - 260