Learning Scene Graph for Better Cross-Domain Image Captioning

被引：0

作者：

Jia, Junhua ^{[1
]}

Xin, Xiaowei ^{[1
]}

Gao, Xiaoyan ^{[1
]}

Ding, Xiangqian ^{[1
]}

Pang, Shunpeng ^{[2
]}

机构：

[1] Ocean Univ China, Fac Informat Sci & Engn, Shandong 266000, Peoples R China

[2] Weifang Univ, Sch Comp Engn, Shandong 261061, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III | 2024年 / 14427卷

关键词：

Image Captioning; Scene Graph; Text-to-Image Synthesis; Dual Learning;

D O I：

10.1007/978-981-99-8435-0_10

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The current image captioning (IC) methods achieve good results within a single domain primarily due to training on a large amount of annotated data. However, the performance of single-domain image captioning methods suffers when extended to new domains. To address this, we propose a cross-domain image captioning framework, called SGCDIC, which achieves cross-domain generalization of image captioning models by simultaneously optimizing two coupled tasks, i.e., image captioning and text-to-image synthesis (TIS). Specifically, we propose a scene-graph-based approach SGAT for image captioning tasks. The image synthesis task employs a GAN variant (DFGAN) to synthesize plausible images based on the generated text descriptions by SGAT. We compare the generated images with the real images to enhance the image captioning performance in new domains. We conduct extensive experiments to evaluate the performance of SGCDIC by using the MSCOCO as the source domain data, and using Flickr30k and Oxford-102 as the new domain data. Sufficient comparative experiments and ablation studies demonstrate that SGCDIC achieves substantially better performance than the strong competitors for the cross-domain image captioning task.

引用

页码：121 / 137

页数：17

共 50 条

[1] Multitask Learning for Cross-Domain Image Captioning
Yang, Min
Zhao, Wei
Xu, Wei
Feng, Yabing
Zhao, Zhou
Chen, Xiaojun
Lei, Kai
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (04) : 1047 - 1061
[2] Dual Learning for Cross-domain Image Captioning
Zhao, Wei
Xu, Wei
Yang, Min
Ye, Jianbo
Zhao, Zhou
Feng, Yabing
Qiao, Yu
CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 29 - 38
[3] Discriminative Style Learning for Cross-Domain Image Captioning
Yuan, Jin
Zhu, Shuai
Huang, Shuyin
Zhang, Hanwang
Xiao, Yaoqiang
Li, Zhiyong
Wang, Meng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1723 - 1736
[4] Cross-domain personalized image captioning
Cuirong Long
Xiaoshan Yang
Changsheng Xu
Multimedia Tools and Applications, 2020, 79 : 33333 - 33348
[5] Cross-domain personalized image captioning
Long, Cuirong
Yang, Xiaoshan
Xu, Changsheng
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 33333 - 33348
[6] Cross-Domain Image Captioning with Discriminative Finetuning
Dessi, Roberto
Bevilacqua, Michele
Gualdoni, Eleonora
Carraz Rakotonirina, Nathanael
Franzon, Francesca
Baroni, Marco
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6935 - 6944
[7] Cross-domain multi-style merge for image captioning
Duan, Yiqun
Wang, Zhen
Li, Yi
Wang, Jingya
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 228
[8] Consensus Graph Representation Learning for Better Grounded Image Captioning
Zhang, Wenqiao
Shi, Haochen
Tang, Siliang
Xiao, Jun
Yu, Qiang
Zhuang, Yueting
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3394 - 3402
[9] Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation
Zhao, Wentian
Wu, Xinxiao
Luo, Jiebo
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1180 - 1192
[10] Cross-domain learning for underwater image enhancement
Li, Fei
Zheng, Jiangbin
Zhang, Yuan-fang
Jia, Wenjing
Wei, Qianru
He, Xiangjian
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 110

← 1 2 3 4 5 →