Image Difference Captioning with Pre-training and Contrastive Learning

被引:0
|
作者
Yao, Linli [1 ]
Wang, Weiying [1 ]
Jin, Qin [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language. The major challenges of this task lie in two aspects: 1) fine-grained visual differences that require learning stronger vision and language association and 2) high-cost of manual annotations that leads to limited supervised data. To address these challenges, we propose a new modeling framework following the pre-training-finetuning paradigm. Specifically, we design three self-supervised tasks and contrastive learning strategies to align visual differences and text descriptions at a fine-grained level. Moreover, we propose a data expansion strategy to utilize extra cross-task supervision information, such as data for fine-grained image classification, to alleviate the limitation of available supervised IDC data. Extensive experiments on two IDC benchmark datasets, CLEVR-Change and Birds-to-Words, demonstrate the effectiveness of the proposed modeling framework. The codes and models will be released at hfips://github.com/yaolinli/IDC.
引用
收藏
页码:3108 / 3116
页数:9
相关论文
共 50 条
  • [31] ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images
    Yang, Jiawei
    Chen, Hanbo
    Liang, Yuan
    Huang, Junzhou
    He, Lei
    Yao, Jianhua
    COMPUTER VISION, ECCV 2022, PT XXI, 2022, 13681 : 523 - 539
  • [32] Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation
    Zhan, Albert
    Zhao, Ruihan
    Pinto, Lerrel
    Abbeel, Pieter
    Laskin, Michael
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 4040 - 4047
  • [33] MoleMCL: a multi-level contrastive learning framework for molecular pre-training
    Zhang, Xinyi
    Xu, Yanni
    Jiang, Changzhi
    Shen, Lian
    Liu, Xiangrong
    BIOINFORMATICS, 2024, 40 (04)
  • [34] Contrastive Pre-training with Adversarial Perturbations for Check-in Sequence Representation Learning
    Gong, Letian
    Lin, Youfang
    Guo, Shengnan
    Lin, Yan
    Wang, Tianyi
    Zheng, Erwen
    Zhou, Zeyu
    Wan, Huaiyu
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 4276 - 4283
  • [35] Pre-training local and non-local geographical influences with contrastive learning
    Oh, Byungkook
    Suh, Ilhyun
    Cha, Kihoon
    Kim, Junbeom
    Park, Goeon
    Jeong, Sihyun
    KNOWLEDGE-BASED SYSTEMS, 2023, 259
  • [36] Learning Transferable User Representations with Sequential Behaviors via Contrastive Pre-training
    Cheng, Mingyue
    Yuan, Fajie
    Liu, Qi
    Xin, Xin
    Chen, Enhong
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 51 - 60
  • [37] Data Determines Distributional Robustness in Contrastive Language-Image Pre-training (CLIP)
    Fang, Alex
    Ilharco, Gabriel
    Wortsman, Mitchell
    Wan, Yuhao
    Shankar, Vaishaal
    Dave, Achal
    Schmidt, Ludwig
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [38] RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-training
    Xie, Chen-Wei
    Sun, Siyang
    Xiong, Xiong
    Zheng, Yun
    Zhao, Deli
    Zhou, Jingren
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19265 - 19274
  • [39] Contrastive Language-knowledge Graph Pre-training
    Yuan, Xiaowei
    Liu, Kang
    Wang, Yequan
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
  • [40] Supervised contrastive pre-training models for mammography screening
    Zhenjie Cao
    Zhuo Deng
    Zhicheng Yang
    Jie Ma
    Lan Ma
    Journal of Big Data, 12 (1)