Cross-Domain Image Captioning with Discriminative Finetuning

被引:2
|
作者
Dessi, Roberto [1 ]
Bevilacqua, Michele [2 ]
Gualdoni, Eleonora [3 ]
Carraz Rakotonirina, Nathanael [3 ]
Franzon, Francesca [3 ]
Baroni, Marco [4 ]
机构
[1] UPF, Meta AI, Barcelona, Spain
[2] Samaya AI, Mountain View, CA USA
[3] UPF, Barcelona, Spain
[4] UPF, ICREA, Barcelona, Spain
基金
欧洲研究理事会;
关键词
D O I
10.1109/CVPR52729.2023.00670
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural captioners are typically trained to mimic human-generated references without optimizing for any specific communication goal, leading to problems such as the generation of vague captions. In this paper, we show that fine-tuning an out-of-the-box neural captioner with a self-supervised discriminative communication objective helps to recover a plain, visually descriptive language that is more informative about image contents. Given a target image, the system must learn to produce a description that enables an out-of-the-box text-conditioned image retriever to identify such image among a set of candidates. We experiment with the popular ClipCap captioner, also replicating the main results with BLIP. In terms of similarity to ground-truth human descriptions, the captions emerging from discriminative finetuning lag slightly behind those generated by the non-finetuned model, when the latter is trained and tested on the same caption dataset. However, when the model is used without further tuning to generate captions for out-of-domain datasets, our discriminatively-finetuned captioner generates descriptions that resemble human references more than those produced by the same captioner wihtout finetuning. We further show that, on the Conceptual Captions dataset, discriminatively finetuned captions are more helpful than either vanilla ClipCap captions or ground-truth captions for human annotators tasked with an image discrimination task.(1)
引用
收藏
页码:6935 / 6944
页数:10
相关论文
共 50 条
  • [1] Discriminative Style Learning for Cross-Domain Image Captioning
    Yuan, Jin
    Zhu, Shuai
    Huang, Shuyin
    Zhang, Hanwang
    Xiao, Yaoqiang
    Li, Zhiyong
    Wang, Meng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1723 - 1736
  • [2] Cross-domain personalized image captioning
    Cuirong Long
    Xiaoshan Yang
    Changsheng Xu
    Multimedia Tools and Applications, 2020, 79 : 33333 - 33348
  • [3] Cross-domain personalized image captioning
    Long, Cuirong
    Yang, Xiaoshan
    Xu, Changsheng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 33333 - 33348
  • [4] Multitask Learning for Cross-Domain Image Captioning
    Yang, Min
    Zhao, Wei
    Xu, Wei
    Feng, Yabing
    Zhao, Zhou
    Chen, Xiaojun
    Lei, Kai
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (04) : 1047 - 1061
  • [5] Dual Learning for Cross-domain Image Captioning
    Zhao, Wei
    Xu, Wei
    Yang, Min
    Ye, Jianbo
    Zhao, Zhou
    Feng, Yabing
    Qiao, Yu
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 29 - 38
  • [6] A DISCRIMINATIVE DOMAIN ADAPTATION MODEL FOR CROSS-DOMAIN IMAGE CLASSIFICATION
    Chou, Yen-Cheng
    Wei, Chia-Po
    Wang, Yu-Chiang Frank
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 3083 - 3087
  • [7] Cross-domain multi-style merge for image captioning
    Duan, Yiqun
    Wang, Zhen
    Li, Yi
    Wang, Jingya
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 228
  • [8] Learning Scene Graph for Better Cross-Domain Image Captioning
    Jia, Junhua
    Xin, Xiaowei
    Gao, Xiaoyan
    Ding, Xiangqian
    Pang, Shunpeng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 121 - 137
  • [9] Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation
    Zhao, Wentian
    Wu, Xinxiao
    Luo, Jiebo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1180 - 1192
  • [10] Discriminative Transfer Feature and Label Consistency for Cross-Domain Image Classification
    Li, Shuang
    Liu, Chi Harold
    Su, Limin
    Xie, Binhui
    Ding, Zhengming
    Chen, C. L. Philip
    Wu, Dapeng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (11) : 4842 - 4856