An Attention-based Regression Model for Grounding Textual Phrases in Images

被引:0
|
作者
Endo, Ko [1 ]
Aono, Masaki [1 ]
Nichols, Eric [2 ]
Funakoshi, Kotaro [2 ]
机构
[1] Toyohashi Univ Technol, Toyohashi, Aichi, Japan
[2] Honda Res Inst Japan, Wako, Saitama, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Grounding, or localizing, a textual phrase in an image is a challenging problem that is integral to visual language understanding. Previous approaches to this task typically make use of candidate region proposals, where end performance depends on that of the region proposal method and additional computational costs are incurred. In this paper, we treat grounding as a regression problem and propose a method to directly identify the region referred to by a textual phrase, eliminating the need for external candidate region prediction. Our approach uses deep neural networks to combine image and text representations and refines the target region with attention models over both image subregions and words in the textual phrase. Despite the challenging nature of this task and sparsity of available data, in evaluation on the ReferIt dataset, our proposed method achieves a new state-of-the-art in performance of 37.26% accuracy, surpassing the previously reported best by over 5 percentage points. We find that combining image and text attention models and an image attention area-sensitive loss function contribute to substantial improvements.
引用
收藏
页码:3995 / 4001
页数:7
相关论文
共 50 条
  • [21] Band Selection of Hyperspectral Images Using Attention-Based Autoencoders
    Dou, Zeyang
    Gao, Kun
    Zhang, Xiaodian
    Wang, Hong
    Han, Lu
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (01) : 147 - 151
  • [22] Multimodal Attention-Based Instruction-Following Part-Level Affordance Grounding
    Qu, Wen
    Guo, Lulu
    Cui, Jian
    Jin, Xiao
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (11):
  • [23] A Hybrid Attention-based Deep Model for Lung Cancer Subtype Classification from Multimodality Images
    Jacob, Chinnu
    Menon, Gopakumar C.
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2024, 33 (01)
  • [24] Combining Attention-based Models with the MeSH Ontology for Semantic Textual Similarity in Clinical Notes
    Faramarzi, Noushin Salek
    Dara, Akanksha
    Banerjee, Ritwik
    [J]. 2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 74 - 83
  • [25] Attention-Based Multimodal Entity Linking with High-Quality Images
    Zhang, Li
    Li, Zhixu
    Yang, Qiang
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 533 - 548
  • [26] Mashup tag completion with attention-based topic model
    Min Shi
    Yufei Tang
    Yu Huang
    Maohua Lin
    [J]. Service Oriented Computing and Applications, 2021, 15 : 43 - 54
  • [27] Attention-based object detection with saliency loss in remote sensing images
    Wu, Qin
    Yuan, Xingxing
    Yao, Zikang
    Chai, Zhilei
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2021, 30 (01)
  • [28] Attention-Based and Staged Iterative Networks for Pansharpening of Remote Sensing Images
    Su, Xunyang
    Li, Jinjiang
    Hua, Zhen
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [29] Deep Fusion of Hyperspectral and LiDAR Images Using Attention-Based CNN
    Falahatnejad S.
    Karami A.
    [J]. SN Computer Science, 4 (1)
  • [30] Attention-Based Deep Recurrent Model for Survival Prediction
    Sun, Zhaohong
    Dong, Wei
    Shi, Jinlong
    He, Kunlun
    Huang, Zhengxing
    [J]. ACM Transactions on Computing for Healthcare, 2021, 2 (04):