Relational Distant Supervision for Image Captioning without Image-Text Pairs

被引:0
|
作者
Qi, Yayun [1 ]
Zhao, Wentian [1 ]
Wu, Xinxiao [1 ,2 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing Key Lab Intelligent Informat Technol, Beijing, Peoples R China
[2] Shenzhen MSU BIT Univ, Guangdong Lab Machine Percept & Intelligent Comp, Shenzhen, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised image captioning aims to generate descriptions of images without relying on any image-sentence pairs for training. Most existing works use detected visual objects or concepts as bridge to connect images and texts. Considering that the relationship between objects carries more information, we use the object relationship as a more accurate connection between images and texts. In this paper, we adapt the idea of distant supervision that extracts the knowledge about object relationships from an external corpus and imparts them to images to facilitate inferring visual object relationships, without introducing any extra pre-trained relationship detectors. Based on these learned informative relationships, we construct pseudo image-sentence pairs for captioning model training. Specifically, our method consists of three modules: (i) a relationship learning module that learns to infer relationships from images under the distant supervision; (ii) a relationship-to-sentence module that transforms the inferred relationships into sentences to generate pseudo image-sentence pairs; (iii) an image captioning module that is trained by using the generated image-sentence pairs. Promising results on three datasets show that our method outper-forms the state-of-the-art methods of unsupervised image captioning.
引用
收藏
页码:4524 / 4532
页数:9
相关论文
共 50 条
  • [31] Kernel triplet loss for image-text retrieval
    Pan, Zhengxin
    Wu, Fangyu
    Zhang, Bailing
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
  • [32] Characterization and classification of semantic image-text relations
    Christian Otto
    Matthias Springstein
    Avishek Anand
    Ralph Ewerth
    [J]. International Journal of Multimedia Information Retrieval, 2020, 9 : 31 - 45
  • [33] Reservoir Computing Transformer for Image-Text Retrieval
    Li, Wenrui
    Ma, Zhengyu
    Deng, Liang-Jian
    Wang, Penghong
    Shi, Jinqiao
    Fan, Xiaopeng
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5605 - 5613
  • [34] Similarity Reasoning and Filtration for Image-Text Matching
    Diao, Haiwen
    Zhang, Ying
    Ma, Lin
    Lu, Huchuan
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1218 - 1226
  • [35] Dynamic Contrastive Distillation for Image-Text Retrieval
    Rao, Jun
    Ding, Liang
    Qi, Shuhan
    Fang, Meng
    Liu, Yang
    Shen, Li
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8383 - 8395
  • [36] Asymmetric Polysemous Reasoning for Image-Text Matching
    Zhang, Hongping
    Yang, Ming
    [J]. 2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1013 - 1022
  • [37] HUYSMANS, LEPERE AND 'A REBOURS', AN IMAGE-TEXT INQUIRY
    HASKELL, ET
    [J]. WORD & IMAGE, 1988, 4 (01) : 393 - 404
  • [38] ITMix: Image-Text Mix Augmentation for Transferring CLIP to Image Classification
    Hong, Tao
    Guo, Xiangyang
    Ma, Jinwen
    [J]. 2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 129 - 133
  • [39] Characterization and classification of semantic image-text relations
    Otto, Christian
    Springstein, Matthias
    Anand, Avishek
    Ewerth, Ralph
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2020, 9 (01) : 31 - 45
  • [40] Fusion layer attention for image-text matching
    Wang, Depeng
    Wang, Liejun
    Song, Shiji
    Huang, Gao
    Guo, Yuchen
    Cheng, Shuli
    Ao, Naixiang
    Du, Anyu
    [J]. NEUROCOMPUTING, 2021, 442 : 249 - 259