Image Difference Captioning with Pre-training and Contrastive Learning

被引：0

作者：

Yao, Linli ^{[1
]}

Wang, Weiying ^{[1
]}

Jin, Qin ^{[1
]}

机构：

[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language. The major challenges of this task lie in two aspects: 1) fine-grained visual differences that require learning stronger vision and language association and 2) high-cost of manual annotations that leads to limited supervised data. To address these challenges, we propose a new modeling framework following the pre-training-finetuning paradigm. Specifically, we design three self-supervised tasks and contrastive learning strategies to align visual differences and text descriptions at a fine-grained level. Moreover, we propose a data expansion strategy to utilize extra cross-task supervision information, such as data for fine-grained image classification, to alleviate the limitation of available supervised IDC data. Extensive experiments on two IDC benchmark datasets, CLEVR-Change and Birds-to-Words, demonstrate the effectiveness of the proposed modeling framework. The codes and models will be released at hfips://github.com/yaolinli/IDC.

引用

页码：3108 / 3116

页数：9

共 50 条

[1] Construction safety inspection with contrastive language-image pre-training (CLIP) image captioning and attention
Lin, Jacob J. (jacoblin@ntu.edu.tw), 2025, 169
[2] Robust Pre-Training by Adversarial Contrastive Learning
Jiang, Ziyu
Chen, Tianlong
Chen, Ting
Wang, Zhangyang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[3] Non-Contrastive Learning Meets Language-Image Pre-Training
Zhou, Jinghao
Dong, Li
Gan, Zhe
Wang, Lijuan
Wei, Furu
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11028 - 11038
[4] Conditional Embedding Pre-Training Language Model for Image Captioning
Li, Pengfei
Zhang, Min
Lin, Peijie
Wan, Jian
Jiang, Ming
NEURAL PROCESSING LETTERS, 2022, 54 (06) : 4987 - 5003
[5] Conditional Embedding Pre-Training Language Model for Image Captioning
Pengfei Li
Min Zhang
Peijie Lin
Jian Wan
Ming Jiang
Neural Processing Letters, 2022, 54 : 4987 - 5003
[6] New Intent Discovery with Pre-training and Contrastive Learning
Zhang, Yuwei
Zhang, Haode
Zhan, Li-Ming
Wu, Xiao-Ming
Lam, Albert Y. S.
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 256 - 269
[7] Contrastive Learning for Image Captioning
Dai, Bo
Lin, Dahua
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[8] Scaling Up Vision-Language Pre-training for Image Captioning
Hu, Xiaowei
Gan, Zhe
Wang, Jianfeng
Yang, Zhengyuan
Liu, Zicheng
Lu, Yumao
Wang, Lijuan
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17959 - 17968
[9] Effective Pre-Training Method and Its Compositional Intelligence for Image Captioning
Choi, Won-Hyuk
Choi, Yong-Suk
SENSORS, 2022, 22 (09)
[10] MimCo: Masked Image Modeling Pre-training with Contrastive Teacher
Zhou, Qiang
Yu, Chaohui
Luo, Hao
Wang, Zhibin
Li, Hao
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4487 - 4495

← 1 2 3 4 5 →