Learning Scene Graph for Better Cross-Domain Image Captioning

被引:0
|
作者
Jia, Junhua [1 ]
Xin, Xiaowei [1 ]
Gao, Xiaoyan [1 ]
Ding, Xiangqian [1 ]
Pang, Shunpeng [2 ]
机构
[1] Ocean Univ China, Fac Informat Sci & Engn, Shandong 266000, Peoples R China
[2] Weifang Univ, Sch Comp Engn, Shandong 261061, Peoples R China
关键词
Image Captioning; Scene Graph; Text-to-Image Synthesis; Dual Learning;
D O I
10.1007/978-981-99-8435-0_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current image captioning (IC) methods achieve good results within a single domain primarily due to training on a large amount of annotated data. However, the performance of single-domain image captioning methods suffers when extended to new domains. To address this, we propose a cross-domain image captioning framework, called SGCDIC, which achieves cross-domain generalization of image captioning models by simultaneously optimizing two coupled tasks, i.e., image captioning and text-to-image synthesis (TIS). Specifically, we propose a scene-graph-based approach SGAT for image captioning tasks. The image synthesis task employs a GAN variant (DFGAN) to synthesize plausible images based on the generated text descriptions by SGAT. We compare the generated images with the real images to enhance the image captioning performance in new domains. We conduct extensive experiments to evaluate the performance of SGCDIC by using the MSCOCO as the source domain data, and using Flickr30k and Oxford-102 as the new domain data. Sufficient comparative experiments and ablation studies demonstrate that SGCDIC achieves substantially better performance than the strong competitors for the cross-domain image captioning task.
引用
收藏
页码:121 / 137
页数:17
相关论文
共 50 条
  • [1] Multitask Learning for Cross-Domain Image Captioning
    Yang, Min
    Zhao, Wei
    Xu, Wei
    Feng, Yabing
    Zhao, Zhou
    Chen, Xiaojun
    Lei, Kai
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (04) : 1047 - 1061
  • [2] Dual Learning for Cross-domain Image Captioning
    Zhao, Wei
    Xu, Wei
    Yang, Min
    Ye, Jianbo
    Zhao, Zhou
    Feng, Yabing
    Qiao, Yu
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 29 - 38
  • [3] Discriminative Style Learning for Cross-Domain Image Captioning
    Yuan, Jin
    Zhu, Shuai
    Huang, Shuyin
    Zhang, Hanwang
    Xiao, Yaoqiang
    Li, Zhiyong
    Wang, Meng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1723 - 1736
  • [4] Cross-domain personalized image captioning
    Cuirong Long
    Xiaoshan Yang
    Changsheng Xu
    Multimedia Tools and Applications, 2020, 79 : 33333 - 33348
  • [5] Cross-domain personalized image captioning
    Long, Cuirong
    Yang, Xiaoshan
    Xu, Changsheng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 33333 - 33348
  • [6] Cross-Domain Image Captioning with Discriminative Finetuning
    Dessi, Roberto
    Bevilacqua, Michele
    Gualdoni, Eleonora
    Carraz Rakotonirina, Nathanael
    Franzon, Francesca
    Baroni, Marco
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6935 - 6944
  • [7] Cross-domain multi-style merge for image captioning
    Duan, Yiqun
    Wang, Zhen
    Li, Yi
    Wang, Jingya
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 228
  • [8] Consensus Graph Representation Learning for Better Grounded Image Captioning
    Zhang, Wenqiao
    Shi, Haochen
    Tang, Siliang
    Xiao, Jun
    Yu, Qiang
    Zhuang, Yueting
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3394 - 3402
  • [9] Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation
    Zhao, Wentian
    Wu, Xinxiao
    Luo, Jiebo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1180 - 1192
  • [10] Cross-domain learning for underwater image enhancement
    Li, Fei
    Zheng, Jiangbin
    Zhang, Yuan-fang
    Jia, Wenjing
    Wei, Qianru
    He, Xiangjian
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 110