Relational Distant Supervision for Image Captioning without Image-Text Pairs

被引:0
|
作者
Qi, Yayun [1 ]
Zhao, Wentian [1 ]
Wu, Xinxiao [1 ,2 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing Key Lab Intelligent Informat Technol, Beijing, Peoples R China
[2] Shenzhen MSU BIT Univ, Guangdong Lab Machine Percept & Intelligent Comp, Shenzhen, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised image captioning aims to generate descriptions of images without relying on any image-sentence pairs for training. Most existing works use detected visual objects or concepts as bridge to connect images and texts. Considering that the relationship between objects carries more information, we use the object relationship as a more accurate connection between images and texts. In this paper, we adapt the idea of distant supervision that extracts the knowledge about object relationships from an external corpus and imparts them to images to facilitate inferring visual object relationships, without introducing any extra pre-trained relationship detectors. Based on these learned informative relationships, we construct pseudo image-sentence pairs for captioning model training. Specifically, our method consists of three modules: (i) a relationship learning module that learns to infer relationships from images under the distant supervision; (ii) a relationship-to-sentence module that transforms the inferred relationships into sentences to generate pseudo image-sentence pairs; (iii) an image captioning module that is trained by using the generated image-sentence pairs. Promising results on three datasets show that our method outper-forms the state-of-the-art methods of unsupervised image captioning.
引用
收藏
页码:4524 / 4532
页数:9
相关论文
共 50 条
  • [1] Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
    Yang, Cong
    Li, Zuchao
    Zhang, Lefei
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [2] More Grounded Image Captioning by Distilling Image-Text Matching Model
    Zhou, Yuanen
    Wang, Meng
    Liu, Daqing
    Hu, Zhenzhen
    Zhang, Hanwang
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4776 - 4785
  • [3] Image-Text Surgery: Efficient Concept Learning in Image Captioning by Generating Pseudopairs
    Fu, Kun
    Li, Jin
    Jin, Junqi
    Zhang, Changshui
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (12) : 5910 - 5921
  • [4] JECL: Joint Embedding and Cluster Learning for Image-Text Pairs
    Yang, Sean T.
    Huang, Kuan-Hao
    Howe, Bill
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8344 - 8351
  • [5] Image-Text Interaction
    Strothotte, Thomas
    [J]. 2007 INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, 2007, : 3 - 3
  • [6] Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
    Kang, Wooyoung
    Mun, Jonghwan
    Lee, Sungjun
    Roh, Byungseok
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2930 - 2940
  • [7] Text-image communication, image-text communication
    Münkner, J
    [J]. ZEITSCHRIFT FUR GERMANISTIK, 2004, 14 (02): : 454 - 455
  • [8] Image-text interaction graph neural network for image-text sentiment analysis
    Wenxiong Liao
    Bi Zeng
    Jianqi Liu
    Pengfei Wei
    Jiongkun Fang
    [J]. Applied Intelligence, 2022, 52 : 11184 - 11198
  • [9] Image-text interaction graph neural network for image-text sentiment analysis
    Liao, Wenxiong
    Zeng, Bi
    Liu, Jianqi
    Wei, Pengfei
    Fang, Jiongkun
    [J]. APPLIED INTELLIGENCE, 2022, 52 (10) : 11184 - 11198
  • [10] Image Captioning with Relational Knowledge
    Yang, Huan
    Song, Dandan
    Liao, Lejian
    [J]. PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2018, 11013 : 378 - 386