PIRC Net: Using Proposal Indexing, Relationships and Context for Phrase Grounding

被引:2
|
作者
Kovvuri, Rama [1 ]
Nevatia, Ram [1 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90007 USA
来源
关键词
Phrase grounding; Phrase localization; Object proposals;
D O I
10.1007/978-3-030-20870-7_28
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Phrase Grounding aims to detect and localize objects in images that are referred to and are queried by natural language phrases. Phrase grounding finds applications in tasks such as Visual Dialog, Visual Search and Image-text co-reference resolution. In this paper, we present a framework that leverages information such as phrase category, relationships among neighboring phrases in a sentence and context to improve the performance of phrase grounding systems. We propose three modules: Proposal Indexing Network (PIN); Inter-phrase Regression Network (IRN) and Proposal Ranking Network (PRN) each of which analyze the region proposals of an image at increasing levels of detail by incorporating the above information. Also, in the absence of ground-truth spatial locations of the phrases (weakly-supervised), we propose knowledge transfer mechanisms that leverages the framework of PIN module. We demonstrate the effectiveness of our approach on the Flickr 30k Entities and ReferItGame datasets, for which we achieve improvements over state-of-the-art approaches in both supervised and weakly-supervised variants.
引用
收藏
页码:451 / 467
页数:17
相关论文
共 50 条
  • [1] Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment
    Chen, Zhihao
    Zhou, Yang
    Tran, Anh
    Zhao, Junting
    wan, Liang
    Ooi, Gideon Su Kai
    Cheng, Lionel Tim-Ee
    Thng, Choon Hua
    Xu, Xinxing
    Liu, Yong
    Fu, Huazhu
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VII, 2023, 14226 : 371 - 381
  • [2] MSRC: multimodal spatial regression with semantic context for phrase grounding
    Chen, Kan
    Kovvuri, Rama
    Gao, Jiyang
    Nevatia, Ram
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2018, 7 (01) : 17 - 28
  • [3] MSRC: multimodal spatial regression with semantic context for phrase grounding
    Kan Chen
    Rama Kovvuri
    Jiyang Gao
    Ram Nevatia
    [J]. International Journal of Multimedia Information Retrieval, 2018, 7 : 17 - 28
  • [4] MSRC: Multimodal Spatial Regression with Semantic Context for Phrase Grounding
    Chen, Kan
    Kovvuri, Rama
    Gao, Jiyang
    Nevatia, Ram
    [J]. PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 23 - 31
  • [5] Query-guided Regression Network with Context Policy for Phrase Grounding
    Chen, Kan
    Kovvuri, Rama
    Nevatia, Ram
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 824 - 832
  • [6] Document Similarity Using a Phrase Indexing Graph Model
    Hammouda, Khaled M.
    Kamel, Mohamed S.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2004, 6 (06) : 710 - 727
  • [7] Document Similarity Using a Phrase Indexing Graph Model
    Khaled M. Hammouda
    Mohamed S. Kamel
    [J]. Knowledge and Information Systems, 2004, 6 : 710 - 727
  • [8] Semantic indexing and searching using a Hopfield net
    Chen, HC
    Zhang, Y
    Houston, AL
    [J]. JOURNAL OF INFORMATION SCIENCE, 1998, 24 (01) : 3 - 18
  • [9] Speech Indexing Using Semantic Context Inference
    Huang, Chien-Lin
    Ma, Bin
    Li, Haizhou
    Wu, Chung-Hsien
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 724 - +
  • [10] Determining the context of text using augmented latent semantic indexing
    Rishel, Tom
    Perkins, Louise A.
    Yenduri, Sumanth
    Zand, Farnaz
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (14): : 2197 - 2204