Local-to-Global Semantic Supervised Learning for Image Captioning

被引:0
|
作者
Wang, Juan [1 ]
Duan, Yiping [1 ]
Tao, Xiaoming [1 ]
Lu, Jianhua [1 ]
机构
[1] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol BNRis, Dept Elect Engn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
image caption; semantic supervised learning; attention mechanism; ATTENTION;
D O I
10.1109/icc40277.2020.9149264
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Image captioning is a challenging problem owing to the complexity of image content and the diverse ways of describing the content in natural language. Although current methods have made substantial progress in terms of objective metrics (such as BLEU, METEOR, ROUGE-L and CIDEr), there still exist some problems. Specifically, most of these methods are trained to maximize the log-likelihood or objective metrics. As a result, these methods often generate rigid and semantically incomplete captions. In this paper, we develop a new model that aims to generate captions conforming to human evaluation. The core idea is to use local-to-global semantic supervised learning by introducing the two-level optimization objective functions. At the word level, we match each word to the image regions using the local attention objective function; at the sentence level, we align the entire sentence and the image using the global semantic objective function. Experimentally, we compare the proposed model with current methods on MSCOCO dataset. We show that either local attention supervision or global semantic supervision is the necessary component for the success of our model through ablation studies. Furthermore, combining these two supervision objective functions achieves state-of-the-art performance in terms of both standard evaluation metrics and human judgment.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation
    Ahmadi, Rozhan
    Kasaei, Shohreh
    [J]. PROCEEDINGS OF THE 13TH IRANIAN/3RD INTERNATIONAL MACHINE VISION AND IMAGE PROCESSING CONFERENCE, MVIP, 2024, : 117 - 123
  • [2] Local-to-Global Cost Aggregation for Semantic Correspondence
    Wang, Zi
    Fu, Zhiheng
    Guo, Yulan
    Li, Zhang
    Yu, Qifeng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1209 - 1222
  • [3] Weakly supervised grounded image captioning with semantic matching
    Sen Du
    Hong Zhu
    Guangfeng Lin
    Yuanyuan Liu
    Dong Wang
    Jing Shi
    Zhong Wu
    [J]. Applied Intelligence, 2024, 54 : 4300 - 4318
  • [4] Weakly supervised grounded image captioning with semantic matching
    Du, Sen
    Zhu, Hong
    Lin, Guangfeng
    Liu, Yuanyuan
    Wang, Dong
    Shi, Jing
    Wu, Zhong
    [J]. APPLIED INTELLIGENCE, 2024, 54 (05) : 4300 - 4318
  • [5] Local-to-Global Semi-Supervised Feature Selection
    Hindawi, Mohammed
    Benabdeslem, Khalid
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 2159 - 2167
  • [6] Local-to-global spatial learning for whole-slide image representation and classification
    Yu, Jiahui
    Ma, Tianyu
    Fu, Yu
    Chen, Hang
    Lai, Maode
    Zhuo, Cheng
    Xu, Yingke
    [J]. COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2023, 107
  • [7] Local-to-global adaptive image enhancement algorithm
    Wu, Jing-Hui
    Tang, Lin-Bo
    Zhao, Bao-Jun
    [J]. Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2014, 34 (09): : 955 - 960
  • [8] L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation
    Jiang, Peng-Tao
    Yang, Yuqi
    Hou, Qibin
    Wei, Yunchao
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16865 - 16875
  • [9] Deep local-to-global feature learning for medical image super-resolution
    Huang, Wenfeng
    Liao, Xiangyun
    Chen, Hao
    Hu, Ying
    Jia, Wenjing
    Wang, Qiong
    [J]. COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2024, 115
  • [10] Local-to-Global Bayesian Network Structure Learning
    Gao, Tian
    Fadnis, Kshitij
    Campbell, Murray
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70