An Attention-based Regression Model for Grounding Textual Phrases in Images

被引:0
|
作者
Endo, Ko [1 ]
Aono, Masaki [1 ]
Nichols, Eric [2 ]
Funakoshi, Kotaro [2 ]
机构
[1] Toyohashi Univ Technol, Toyohashi, Aichi, Japan
[2] Honda Res Inst Japan, Wako, Saitama, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Grounding, or localizing, a textual phrase in an image is a challenging problem that is integral to visual language understanding. Previous approaches to this task typically make use of candidate region proposals, where end performance depends on that of the region proposal method and additional computational costs are incurred. In this paper, we treat grounding as a regression problem and propose a method to directly identify the region referred to by a textual phrase, eliminating the need for external candidate region prediction. Our approach uses deep neural networks to combine image and text representations and refines the target region with attention models over both image subregions and words in the textual phrase. Despite the challenging nature of this task and sparsity of available data, in evaluation on the ReferIt dataset, our proposed method achieves a new state-of-the-art in performance of 37.26% accuracy, surpassing the previously reported best by over 5 percentage points. We find that combining image and text attention models and an image attention area-sensitive loss function contribute to substantial improvements.
引用
收藏
页码:3995 / 4001
页数:7
相关论文
共 50 条
  • [31] Attention-based similarity
    Stentiford, Fred
    [J]. PATTERN RECOGNITION, 2007, 40 (03) : 771 - 783
  • [32] Multiscale attention-based detection of tiny targets in aerial beach images
    Gao, Shurun
    Liu, Chang
    Zhang, Haimiao
    Zhou, Zhehai
    Qiu, Jun
    [J]. FRONTIERS IN MARINE SCIENCE, 2022, 9
  • [33] An Attention-Based Model for Learning Dynamic Interaction Networks
    Cavallari, Sandro
    Poria, Soujanya
    Cambria, Erik
    Zheng, Vincent W.
    Cai, Hongyun
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [34] Attention-based learning
    Kasderidis, S
    Taylor, JG
    [J]. 2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 525 - 530
  • [35] A Temporal Attention-based Model for Social Event Prediction
    Wang Yinsen
    Zhang Xin
    Pan Yan
    Fu Zexin
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [36] An Attention-Based Friend Recommendation Model in Social Network
    Cai, Chongchao
    Xu, Huahu
    Wan, Jie
    Zhou, Baiqing
    Xie, Xiongwei
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 65 (03): : 2475 - 2488
  • [37] Attention-Based CNN Model for Burn Severity Assessment
    Rahman, Saeka
    Faezipour, Miad
    Ribeiro, Guilherme Aramizo
    Ridelman, Elika
    Klein, Justin D.
    Angst, Beth A.
    Shanti, Christina M.
    Rastgaar, Mo
    [J]. 2023 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS, BHI, 2023,
  • [38] A Visual Attention-Based Model for Bengali Image Captioning
    Das B.
    Pal R.
    Majumder M.
    Phadikar S.
    Sekh A.A.
    [J]. SN Computer Science, 4 (2)
  • [39] Attention-based Autoencoder Topic Model for Short Texts
    Tian, Tian
    Fang, Zheng
    [J]. 10TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2019) / THE 2ND INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40 2019) / AFFILIATED WORKSHOPS, 2019, 151 : 1134 - 1139
  • [40] Mashup tag completion with attention-based topic model
    Shi, Min
    Tang, Yufei
    Huang, Yu
    Lin, Maohua
    [J]. SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2021, 15 (01) : 43 - 54