Image Difference Captioning with Pre-training and Contrastive Learning

被引:0
|
作者
Yao, Linli [1 ]
Wang, Weiying [1 ]
Jin, Qin [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language. The major challenges of this task lie in two aspects: 1) fine-grained visual differences that require learning stronger vision and language association and 2) high-cost of manual annotations that leads to limited supervised data. To address these challenges, we propose a new modeling framework following the pre-training-finetuning paradigm. Specifically, we design three self-supervised tasks and contrastive learning strategies to align visual differences and text descriptions at a fine-grained level. Moreover, we propose a data expansion strategy to utilize extra cross-task supervision information, such as data for fine-grained image classification, to alleviate the limitation of available supervised IDC data. Extensive experiments on two IDC benchmark datasets, CLEVR-Change and Birds-to-Words, demonstrate the effectiveness of the proposed modeling framework. The codes and models will be released at hfips://github.com/yaolinli/IDC.
引用
收藏
页码:3108 / 3116
页数:9
相关论文
共 50 条
  • [41] Multi-Modal Contrastive Pre-training for Recommendation
    Liu, Zhuang
    Ma, Yunpu
    Schubert, Matthias
    Ouyang, Yuanxin
    Xiong, Zhang
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 99 - 108
  • [42] VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
    Hu, Xiaowei
    Yin, Xi
    Lin, Kevin
    Zhang, Lei
    Gao, Jianfeng
    Wang, Lijuan
    Liu, Zicheng
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1575 - 1583
  • [43] Contrastive semantic similarity learning for image captioning evaluation
    Zeng, Chao
    Kwong, Sam
    Zhao, Tiesong
    Wang, Hanli
    INFORMATION SCIENCES, 2022, 609 : 913 - 930
  • [44] Multilingual Pre-training Model-Assisted Contrastive Learning Neural Machine Translation
    Sun, Shuo
    Hou, Hong-xu
    Yang, Zong-heng
    Wang, Yi-song
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [45] Graph Contrastive Multi-view Learning: A Pre-training Framework for Graph Classification
    Adjeisah M.
    Zhu X.
    Xu H.
    Ayall T.A.
    Knowledge-Based Systems, 2024, 299
  • [46] Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation
    Jiang, Chaoya
    Ye, Wei
    Xu, Haiyang
    Huang, Songfang
    Huang, Fei
    Zhang, Shikun
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14660 - 14679
  • [47] Bridge Pre-Training and Clustering: A Unified Contrastive Learning Framework for OOD Intent Discovery
    Mou, Yutao
    Xu, Heyang
    IEEE ACCESS, 2023, 11 : 63714 - 63724
  • [48] ARCHICLIP Enhanced Contrastive Language-Image Pre-training Model With Architectural Prior Knowledge
    Xia, Shengtao
    Cheng, Yiming
    Tian, Runjia
    PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE OF THE ASSOCIATION FOR COMPUTER-AIDED ARCHITECTURAL DESIGN RESEARCH IN ASIA, CAADRIA 2024, VOL 1, 2024, : 69 - 78
  • [49] Maternal Fetal Ultrasound Planes Classification using Contrastive Language Image Pre-training Models
    Bhuma, Chandra Mohan
    PROCEEDINGS OF THE 2024 IEEE SOUTH ASIAN ULTRASONICS SYMPOSIUM, SAUS 2024, 2024,
  • [50] Multimodal Hate Speech Detection in Memes Using Contrastive Language-Image Pre-Training
    Arya, Greeshma
    Hasan, Mohammad Kamrul
    Bagwari, Ashish
    Safie, Nurhizam
    Islam, Shayla
    Ahmed, Fatima Rayan Awad
    De, Aaishani
    Khan, Muhammad Attique
    Ghazal, Taher M.
    IEEE ACCESS, 2024, 12 : 22359 - 22375