Cross-Domain Image Captioning with Discriminative Finetuning

被引:2
|
作者
Dessi, Roberto [1 ]
Bevilacqua, Michele [2 ]
Gualdoni, Eleonora [3 ]
Carraz Rakotonirina, Nathanael [3 ]
Franzon, Francesca [3 ]
Baroni, Marco [4 ]
机构
[1] UPF, Meta AI, Barcelona, Spain
[2] Samaya AI, Mountain View, CA USA
[3] UPF, Barcelona, Spain
[4] UPF, ICREA, Barcelona, Spain
基金
欧洲研究理事会;
关键词
D O I
10.1109/CVPR52729.2023.00670
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural captioners are typically trained to mimic human-generated references without optimizing for any specific communication goal, leading to problems such as the generation of vague captions. In this paper, we show that fine-tuning an out-of-the-box neural captioner with a self-supervised discriminative communication objective helps to recover a plain, visually descriptive language that is more informative about image contents. Given a target image, the system must learn to produce a description that enables an out-of-the-box text-conditioned image retriever to identify such image among a set of candidates. We experiment with the popular ClipCap captioner, also replicating the main results with BLIP. In terms of similarity to ground-truth human descriptions, the captions emerging from discriminative finetuning lag slightly behind those generated by the non-finetuned model, when the latter is trained and tested on the same caption dataset. However, when the model is used without further tuning to generate captions for out-of-domain datasets, our discriminatively-finetuned captioner generates descriptions that resemble human references more than those produced by the same captioner wihtout finetuning. We further show that, on the Conceptual Captions dataset, discriminatively finetuned captions are more helpful than either vanilla ClipCap captions or ground-truth captions for human annotators tasked with an image discrimination task.(1)
引用
收藏
页码:6935 / 6944
页数:10
相关论文
共 50 条
  • [31] Cross-Domain Collaborative Learning via Discriminative Nonparametric Bayesian Model
    Qian, Shengsheng
    Zhang, Tianzhu
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (08) : 2086 - 2099
  • [32] Duplex adversarial domain discriminative network for cross-domain partial transfer fault diagnosis
    Liu, Fuqiang
    Deng, Wenlong
    Duan, Chaoqun
    Qin, Yi
    Luo, Jun
    Pu, Huayan
    KNOWLEDGE-BASED SYSTEMS, 2023, 279
  • [33] Discriminative Extreme Learning Machine with Cross-Domain Mean Approximation for Unsupervised Domain Adaptation
    Zang, Shaofei
    Li, Xinghai
    Ma, Jianwei
    Yan, Yongyi
    Lv, Jinfeng
    Wei, Yuan
    COMPLEXITY, 2022, 2022
  • [34] Discriminative adversarial domain generalization with meta-learning based cross-domain validation
    Chen, Keyu
    Zhuang, Di
    Chang, J. Morris
    NEUROCOMPUTING, 2022, 467 : 418 - 426
  • [35] Discriminative Feature Selection for Multi-View Cross-Domain Learning
    Fang, Zheng
    Zhang, Zhongfei
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1321 - 1330
  • [36] Unsupervised Domain Adaptation for Cross-domain Histopathology Image Classification
    Xiangning Li
    Chen Pan
    Lingmin He
    Xinyu Li
    Multimedia Tools and Applications, 2024, 83 : 23311 - 23331
  • [37] Unsupervised Domain Adaptation for Cross-domain Histopathology Image Classification
    Li, Xiangning
    Pan, Chen
    He, Lingmin
    Li, Xinyu
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 23311 - 23331
  • [38] Unsupervised Domain Adaptation for Cross-domain Histopathology Image Classification
    Li, Xiangning
    Pan, Chen
    He, Lingmin
    Li, Xinyu
    Multimedia Tools and Applications, 2024, 83 (08) : 23311 - 23331
  • [39] Cross-Domain Interpolation for Unpaired Image-to-Image Translation
    Lopez, Jorge
    Mauricio, Antoni
    Diaz, Jose
    Camara, Guillermo
    COMPUTER VISION SYSTEMS (ICVS 2019), 2019, 11754 : 542 - 551
  • [40] Robust adversarial discriminative domain adaptation for real-world cross-domain visual recognition
    Yang, Jianfei
    Zou, Han
    Zhou, Yuxun
    Xie, Lihua
    NEUROCOMPUTING, 2021, 433 : 28 - 36