Neural Sequential Phrase Grounding (SeqGROUND)

被引:21
|
作者
Dogan, Pelin [1 ]
Sigal, Leonid [2 ,3 ]
Gross, Markus [1 ,4 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Univ British Columbia, Vancouver, BC, Canada
[3] Vector Inst, Toronto, ON, Canada
[4] Disney Res, Zurich, Switzerland
关键词
D O I
10.1109/CVPR.2019.00430
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an end-to-end approach for phrase grounding in images. Unlike prior methods that typically attempt to ground each phrase independently by building an imagetext embedding, our architecture formulates grounding of multiple phrases as a sequential and contextual process. Specifically, we encode region proposals and all phrases into two stacks of LSTM cells, along with so far grounded phrase-region pairs. These LSTM stacks collectively capture context for grounding of the next phrase. The resulting architecture, which we call SeqGROUND, supports many-to-many matching by allowing an image region to be matched to multiple phrases and vice versa. We show competitive performance on the Flickr30K benchmark dataset and, through ablation studies, validate the efficacy of sequential grounding as well as individual design choices in our model architecture.
引用
收藏
页码:4170 / 4179
页数:10
相关论文
共 50 条
  • [1] Structural and sequential regularities modulate phrase-rate neural tracking
    Zhao, Junyuan
    Martin, Andrea E.
    Coopmans, Cas W.
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [2] Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment
    Chen, Zhihao
    Zhou, Yang
    Tran, Anh
    Zhao, Junting
    wan, Liang
    Ooi, Gideon Su Kai
    Cheng, Lionel Tim-Ee
    Thng, Choon Hua
    Xu, Xinxing
    Liu, Yong
    Fu, Huazhu
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VII, 2023, 14226 : 371 - 381
  • [3] Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
    Chen, Kan
    Gao, Jiyang
    Nevatia, Ram
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4042 - 4050
  • [4] Disentangled Motif-aware Graph Learning for Phrase Grounding
    Mu, Zongshen
    Tang, Siliang
    Tan, Jie
    Yu, Qiang
    Zhuang, Yueting
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13587 - 13594
  • [5] MSRC: multimodal spatial regression with semantic context for phrase grounding
    Chen, Kan
    Kovvuri, Rama
    Gao, Jiyang
    Nevatia, Ram
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2018, 7 (01) : 17 - 28
  • [6] A Dual Reinforcement Learning Framework for Weakly Supervised Phrase Grounding
    Wang, Zhiyu
    Yang, Chao
    Jiang, Bin
    Yuan, Junsong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 394 - 405
  • [7] Cross-Modal Omni Interaction Modeling for Phrase Grounding
    Yu, Tianyu
    Hui, Tianrui
    Yu, Zhihao
    Liao, Yue
    Yu, Sansi
    Zhang, Faxi
    Liu, Si
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1725 - 1734
  • [8] MSRC: multimodal spatial regression with semantic context for phrase grounding
    Kan Chen
    Rama Kovvuri
    Jiyang Gao
    Ram Nevatia
    [J]. International Journal of Multimedia Information Retrieval, 2018, 7 : 17 - 28
  • [9] MSRC: Multimodal Spatial Regression with Semantic Context for Phrase Grounding
    Chen, Kan
    Kovvuri, Rama
    Gao, Jiyang
    Nevatia, Ram
    [J]. PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 23 - 31
  • [10] Sequential processing during noun phrase production
    Buerki, Audrey
    Sadat, Jasmin
    Dubarry, Anne-Sophie
    Alario, F. -Xavier
    [J]. COGNITION, 2016, 146 : 90 - 99