Controllable Image Captioning via Prompting

被引:0
|
作者
Wang, Ning [1 ]
Xie, Jiahao [1 ]
Wu, Jihao [1 ]
Jia, Mingbo [1 ]
Li, Linlin [1 ]
机构
[1] Huawei Inc, Shenzhen, Peoples R China
关键词
TRANSFORMER; LANGUAGE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the remarkable progress of image captioning, existing captioners typically lack the controllable capability to generate desired image captions, e.g., describing the image in a rough or detailed manner, in a factual or emotional view, etc. In this paper, we show that a unified model is qualified to perform well in diverse domains and freely switch among multiple styles. Such a controllable capability is achieved by embedding the prompt learning into the image captioning framework. To be specific, we design a set of prompts to fine-tune the pre-trained image captioner. These prompts allow the model to absorb stylized data from different domains for joint training, without performance degradation in each domain. Furthermore, we optimize the prompts with learnable vectors in the continuous word embedding space, avoiding the heuristic prompt engineering and meanwhile exhibiting superior performance. In the inference stage, our model is able to generate desired stylized captions by choosing the corresponding prompts. Extensive experiments verify the controllable capability of the proposed method. Notably, we achieve outstanding performance on two diverse image captioning benchmarks including COCO Karpathy split and TextCaps using a unified model.
引用
收藏
页码:2617 / 2625
页数:9
相关论文
共 50 条
  • [1] Image Captioning With Controllable and Adaptive Length Levels
    Ding, Ning
    Deng, Chaorui
    Tan, Mingkui
    Du, Qing
    Ge, Zhiwei
    Wu, Qi
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (02) : 764 - 779
  • [2] Learning Combinatorial Prompts for Universal Controllable Image Captioning
    Wang, Zhen
    Xiao, Jun
    Zhuang, Yueting
    Gao, Fei
    Shao, Jian
    Chen, Long
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024,
  • [3] Imageability- and Length-Controllable Image Captioning
    Kastner, Marc A.
    Umemura, Kazuki
    Ide, Ichiro
    Kawanishi, Yasutomo
    Hirayama, Takatsugu
    Doman, Keisuke
    Deguchi, Daisuke
    Murase, Hiroshi
    Satoh, Shin'Ichi
    [J]. IEEE ACCESS, 2021, 9 (09): : 162951 - 162961
  • [4] Controllable Image Captioning with Feature Refinement and Multilayer Fusion
    Du, Sen
    Zhu, Hong
    Zhang, Yujia
    Wang, Dong
    Shi, Jing
    Xing, Nan
    Lin, Guangfeng
    Zhou, Huiyu
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [5] Engaging Image Captioning via Personality
    Shuster, Kurt
    Humeau, Samuel
    Hu, Hexiang
    Bordes, Antoine
    Weston, Jason
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12508 - 12518
  • [6] Image captioning via semantic element embedding
    Zhang, Xiaodan
    He, Shengfeng
    Song, Xinhang
    Lau, Rynson W. H.
    Jiao, Jianbin
    Ye, Qixiang
    [J]. NEUROCOMPUTING, 2020, 395 : 212 - 221
  • [7] Image Captioning via Dynamic Path Customization
    Ma, Yiwei
    Ji, Jiayi
    Sun, Xiaoshuai
    Zhou, Yiyi
    Hong, Xiaopeng
    Wu, Yongjian
    Ji, Rongrong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [8] On Distinctive Image Captioning via Comparing and Reweighting
    Wang, Jiuniu
    Xu, Wenjia
    Wang, Qingzhong
    Chan, Antoni B. B.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 2088 - 2103
  • [9] IMPROVE IMAGE CAPTIONING VIA RELATION MODELING
    Huang, Feicheng
    Li, Zhixin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1945 - 1949
  • [10] Image captioning via proximal policy optimization
    Zhang, Le
    Zhang, Yanshuo
    Zhao, Xin
    Zou, Zexiao
    [J]. IMAGE AND VISION COMPUTING, 2021, 108