MSRC: multimodal spatial regression with semantic context for phrase grounding

被引:0
|
作者
Kan Chen
Rama Kovvuri
Jiyang Gao
Ram Nevatia
机构
[1] University of Southern California,Institute for Robotics and Intelligent Systems
关键词
Phrase grounding; Spatial regression; Multimodal; context;
D O I
暂无
中图分类号
学科分类号
摘要
Given a textual description of an image, phrase grounding localizes objects in the image referred by query phrases in the description. State-of-the-art methods treat phrase grounding as a ranking problem and address it by retrieving a set of proposals according to the query’s semantics, which are limited by the performance of independent proposal generation systems and ignore useful cues from context in the description. In this paper, we propose a novel multimodal spatial regression with semantic context (MSRC) system which not only predicts the location of ground truth based on proposal bounding boxes, but also refines prediction results by penalizing similarities of different queries coming from same sentences. There are two advantages of MSRC: First, it sidesteps the performance upper bound from independent proposal generation systems by adopting regression mechanism. Second, MSRC not only encodes the semantics of a query phrase, but also considers its relation with context (i.e., other queries from the same sentence) via a context refinement network. Experiments show MSRC system achieves a significant improvement in accuracy on two popular datasets: Flickr30K Entities and Refer-it Game, with 6.64 and 5.28% increase over the state of the arts, respectively.
引用
收藏
页码:17 / 28
页数:11
相关论文
共 50 条
  • [1] MSRC: multimodal spatial regression with semantic context for phrase grounding
    Chen, Kan
    Kovvuri, Rama
    Gao, Jiyang
    Nevatia, Ram
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2018, 7 (01) : 17 - 28
  • [2] MSRC: Multimodal Spatial Regression with Semantic Context for Phrase Grounding
    Chen, Kan
    Kovvuri, Rama
    Gao, Jiyang
    Nevatia, Ram
    [J]. PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 23 - 31
  • [3] Query-guided Regression Network with Context Policy for Phrase Grounding
    Chen, Kan
    Kovvuri, Rama
    Nevatia, Ram
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 824 - 832
  • [4] Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding
    Akbari, Hassan
    Karaman, Svebor
    Bhargava, Surabhi
    Chen, Brian
    Vondrick, Carl
    Chang, Shih-Fu
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12468 - 12478
  • [5] Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment
    Chen, Zhihao
    Zhou, Yang
    Tran, Anh
    Zhao, Junting
    wan, Liang
    Ooi, Gideon Su Kai
    Cheng, Lionel Tim-Ee
    Thng, Choon Hua
    Xu, Xinxing
    Liu, Yong
    Fu, Huazhu
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VII, 2023, 14226 : 371 - 381
  • [6] MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
    Wang, Qinxin
    Tan, Hao
    Shen, Sheng
    Mahoney, Michael W.
    Yao, Zhewei
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2030 - 2038
  • [7] PiC: A Phrase-in-Context Dataset for Phrase Understanding and Semantic Search
    Pham, Thang M.
    Yoon, Seunghyun
    Bui, Trung
    Nguyen, Anh
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1 - 26
  • [8] Grounding semantic maps in spatial databases
    Deeken, Henning
    Wiemann, Thomas
    Hertzberg, Joachim
    [J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2018, 105 : 146 - 165
  • [9] PIRC Net: Using Proposal Indexing, Relationships and Context for Phrase Grounding
    Kovvuri, Rama
    Nevatia, Ram
    [J]. COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 451 - 467
  • [10] A Multimodal Connectionist Architecture for Unsupervised Grounding of Spatial Language
    Vavrecka, Michal
    Farkas, Igor
    [J]. COGNITIVE COMPUTATION, 2014, 6 (01) : 101 - 112