Query-guided Regression Network with Context Policy for Phrase Grounding

被引:64
|
作者
Chen, Kan [1 ]
Kovvuri, Rama [1 ]
Nevatia, Ram [1 ]
机构
[1] Univ Southern Calif, Inst Robot & Intelligent Syst, Los Angeles, CA 90089 USA
关键词
D O I
10.1109/ICCV.2017.95
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a textual description of an image, phrase grounding localizes objects in the image referred by query phrases in the description. State-of-the-art methods address the problem by ranking a set of proposals based on the relevance to each query, which are limited by the performance of independent proposal generation systems and ignore useful cues from context in the description. In this paper, we adopt a spatial regression method to break the performance limit, and introduce reinforcement learning techniques to further leverage semantic context information. We propose a novel Query-guided Regression network with Context policy (QRC Net) which jointly learns a Proposal Generation Network (PGN), a Query-guided Regression Network (QRN) and a Context Policy Network (CPN). Experiments show QRC Net provides a significant improvement in accuracy on two popular datasets: Flickr30K Entities and Referit Game, with 14.25% and 17.14% increase over the state-of-the-arts respectively.
引用
收藏
页码:824 / 832
页数:9
相关论文
共 50 条
  • [1] Enhancing Context Modeling with a Query-Guided Capsule Network for Document-level Translation
    Yang, Zhengxin
    Zhang, Jinchao
    Meng, Fandong
    Gu, Shuhao
    Feng, Yang
    Zhou, Jie
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1527 - 1537
  • [2] Query-Guided Refinement and Dynamic Spans Network for Video Highlight Detection and Temporal Grounding in Online Information Systems
    Xu, Yifang
    Sun, Yunzhuo
    Xie, Zien
    Zhai, Benxiang
    Jia, Youyao
    Du, Sidan
    [J]. INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2023, 19 (01)
  • [3] Query-Guided Maximum Satisfiability
    Zhang, Xin
    Mangal, Ravi
    Nori, Aditya V.
    Naik, Mayur
    [J]. ACM SIGPLAN NOTICES, 2016, 51 (01) : 109 - 122
  • [4] Multimodal query-guided object localization
    Tripathi, Aditay
    Dani, Rajath R.
    Mishra, Anand
    Chakraborty, Anirban
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) : 14857 - 14881
  • [5] MSRC: Multimodal Spatial Regression with Semantic Context for Phrase Grounding
    Chen, Kan
    Kovvuri, Rama
    Gao, Jiyang
    Nevatia, Ram
    [J]. PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 23 - 31
  • [6] MSRC: multimodal spatial regression with semantic context for phrase grounding
    Chen, Kan
    Kovvuri, Rama
    Gao, Jiyang
    Nevatia, Ram
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2018, 7 (01) : 17 - 28
  • [7] MSRC: multimodal spatial regression with semantic context for phrase grounding
    Kan Chen
    Rama Kovvuri
    Jiyang Gao
    Ram Nevatia
    [J]. International Journal of Multimedia Information Retrieval, 2018, 7 : 17 - 28
  • [8] Multimodal query-guided object localization
    Aditay Tripathi
    Rajath R Dani
    Anand Mishra
    Anirban Chakraborty
    [J]. Multimedia Tools and Applications, 2024, 83 : 14857 - 14881
  • [9] Query-Guided Prototype Evolution Network for Few-Shot Segmentation
    Cong, Runmin
    Xiong, Hang
    Chen, Jinpeng
    Zhang, Wei
    Huang, Qingming
    Zhao, Yao
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6501 - 6512
  • [10] Query-guided generalizable medical image segmentation
    Yang, Zhiyi
    Zhao, Zhou
    Gu, Yuliang
    Xu, Yongchao
    [J]. PATTERN RECOGNITION LETTERS, 2024, 184 : 52 - 58