A Dual Reinforcement Learning Framework for Weakly Supervised Phrase Grounding

被引:0
|
作者
Wang, Zhiyu [1 ]
Yang, Chao [1 ]
Jiang, Bin [1 ]
Yuan, Junsong [2 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Peoples R China
[2] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
基金
中国国家自然科学基金;
关键词
Grounding; Task analysis; Training; Reinforcement learning; Optimization; Image reconstruction; Proposals; Weakly supervised phrase grounding; visual grounding; dual learning; reinforcement learning; NETWORK; LANGUAGE;
D O I
10.1109/TMM.2023.3265816
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Weakly-supervised phrase grounding aims to localize a specific region in an image that corresponds to the given textual phrase, where the mapping between noun phrases and image regions is not available in the training stage. Previous methods typically exploit an additional proxy task (e.g., phrase reconstruction or image-phrase alignment) to provide supervision for training, since the lack of region-level annotations in the weakly-supervised setting. However, there exists a significant gap in optimization objectives between the proxy tasks and the target grounding task, which may result in low-efficient optimization for the target model. Therefore, in this paper, we propose a novel dual reinforcement learning framework to directly optimize the phrase grounding model. Specifically, we consider the duality of phrase grounding and phrase generation tasks. These two tasks form a closed loop that can provide quality feedback signals to measure the performance of each other. In this way, we can measure the correctness of the localized regions and thus be able to optimize the grounding model directly. We design two reward functions to quantify the feedback signals and train the models via reinforcement learning. In addition, to relieve the training difficulty of our framework, we present a heuristic algorithm to generate pseudo region-phrase pairs to warm-start our models. We perform experiments on two popular phrase grounding datasets: ReferItGame and Flickr30K Entities, and the results demonstrate that our method outperforms the previous methods by a large margin.
引用
收藏
页码:394 / 405
页数:12
相关论文
共 50 条
  • [1] MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
    Wang, Qinxin
    Tan, Hao
    Shen, Sheng
    Mahoney, Michael W.
    Yao, Zhewei
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2030 - 2038
  • [2] Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
    Chen, Kan
    Gao, Jiyang
    Nevatia, Ram
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4042 - 4050
  • [3] Improving weakly supervised phrase grounding via visual representation contextualization with contrastive learning
    Wang, Xue
    Du, Youtian
    Verberne, Suzan
    Verbeek, Fons J.
    [J]. APPLIED INTELLIGENCE, 2023, 53 (11) : 14690 - 14702
  • [4] Improving weakly supervised phrase grounding via visual representation contextualization with contrastive learning
    Xue Wang
    Youtian Du
    Suzan Verberne
    Fons J. Verbeek
    [J]. Applied Intelligence, 2023, 53 : 14690 - 14702
  • [5] Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos
    Wu, Jie
    Li, Guanbin
    Han, Xiaoguang
    Lin, Liang
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1283 - 1291
  • [6] Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding
    Shaharabany, Tal
    Wolf, Lior
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6925 - 6934
  • [7] Reinforcement Learning with Multi-Policy Movement Strategy for Weakly Supervised Temporal Sentence Grounding
    Jiang, Shan
    Kong, Yuqiu
    Zhang, Lihe
    Yin, Baocai
    [J]. Applied Sciences (Switzerland), 2024, 14 (21):
  • [8] Cascaded Searching Reinforcement Learning Agent for Proposal-Free Weakly-Supervised Phrase Comprehension
    Wang, Yaodong
    Yue, Lili
    Li, Maoqing
    [J]. ELECTRONICS, 2024, 13 (05)
  • [9] Inverse Compositional Learning for Weakly-supervised Relation Grounding
    Li, Huan
    Wei, Ping
    Ma, Zeyu
    Zheng, Nanning
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15431 - 15441
  • [10] Weakly-Supervised Reinforcement Learning for Controllable Behavior
    Lee, Lisa
    Eysenbach, Benjamin
    Salakhutdinov, Ruslan
    Gu, Shane
    Finn, Chelsea
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33