Image-Text Surgery: Efficient Concept Learning in Image Captioning by Generating Pseudopairs

被引:16
|
作者
Fu, Kun [1 ,2 ,3 ]
Li, Jin [1 ,2 ,3 ]
Jin, Junqi [1 ,2 ,3 ]
Zhang, Changshui [1 ,2 ,3 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[2] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China
[3] Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
基金
北京市自然科学基金;
关键词
Image captioning; novel concept; pseudodata; visual attention;
D O I
10.1109/TNNLS.2018.2813306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning aims to generate natural language sentences to describe the salient parts of a given image. Although neural networks have recently achieved promising results, a key problem is that they can only describe concepts seen in the training image-sentence pairs. Efficient learning of novel concepts has thus been a topic of recent interest to alleviate the expensive manpower of labeling data. In this paper, we propose a novel method, Image-Text Surgery, to synthesize pseudoimage-sentence pairs. The pseudopairs are generated under the guidance of a knowledge base, with syntax from a seed data set (i.e., MSCOCO) and visual information from an existing large-scale image base (i.e., ImageNet). Via pseudodata, the captioning model learns novel concepts without any corresponding human-labeled pairs. We further introduce adaptive visual replacement, which adaptively filters unnecessary visual features in pseudodata with an attention mechanism. We evaluate our approach on a held-out subset of the MSCOCO data set. The experimental results demonstrate that the proposed approach provides significant performance improvements over state-of-the-art methods in terms of F1 score and sentence quality. An ablation study and the qualitative results further validate the effectiveness of our approach.
引用
收藏
页码:5910 / 5921
页数:12
相关论文
共 50 条
  • [1] Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
    Yang, Cong
    Li, Zuchao
    Zhang, Lefei
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [2] Relational Distant Supervision for Image Captioning without Image-Text Pairs
    Qi, Yayun
    Zhao, Wentian
    Wu, Xinxiao
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4524 - 4532
  • [3] More Grounded Image Captioning by Distilling Image-Text Matching Model
    Zhou, Yuanen
    Wang, Meng
    Liu, Daqing
    Hu, Zhenzhen
    Zhang, Hanwang
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4776 - 4785
  • [4] Learning Image-Text Associations
    Jiang, Tao
    Tan, Ah-Hwee
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (02) : 161 - 177
  • [5] Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
    Kang, Wooyoung
    Mun, Jonghwan
    Lee, Sungjun
    Roh, Byungseok
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2930 - 2940
  • [6] Compositional Learning of Image-Text Query for Image Retrieval
    Anwaar, Muhammad Umer
    Labintcev, Egor
    Kleinsteuber, Martin
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1139 - 1148
  • [7] BIT: Improving Image-text Sentiment Analysis via Learning Bidirectional Image-text Interaction
    Xiao, Xingwang
    Pu, Yuanyuan
    Zhao, Zhengpeng
    Gu, Jinjing
    Xu, Dan
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [8] Image-Text Interaction
    Strothotte, Thomas
    [J]. 2007 INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, 2007, : 3 - 3
  • [9] Text-image communication, image-text communication
    Münkner, J
    [J]. ZEITSCHRIFT FUR GERMANISTIK, 2004, 14 (02): : 454 - 455
  • [10] Image-text interaction graph neural network for image-text sentiment analysis
    Wenxiong Liao
    Bi Zeng
    Jianqi Liu
    Pengfei Wei
    Jiongkun Fang
    [J]. Applied Intelligence, 2022, 52 : 11184 - 11198