Reinforcement Learning Transformer for Image Captioning Generation Model

被引:0
|
作者
Yan, Zhaojie [1 ]
机构
[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
关键词
Image captioning; transformer; reinforcement learning; reward dynamics backpropagation;
D O I
10.1117/12.2680670
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning generation is a combination of the visual domain and natural language processing. The transformer framework has become the mainstream approach. This paper combines reinforcement learning and transformer methods to reward dynamics backpropagation and normalization in the testing phase. Its characteristic is that when the steps of reinforcement learning increase, the agent model has more knowledge of the fully information, which reduces the computing cost of the system. The experimental results show that the reinforcement transformer structure has achieved a certain improvement in speed.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning
    Dong, Xinzhi
    Long, Chengjiang
    Xu, Wenju
    Xiao, Chunxia
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2615 - 2624
  • [22] Double-Stream Position Learning Transformer Network for Image Captioning
    Jiang, Weitao
    Zhou, Wei
    Hu, Haifeng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 7706 - 7718
  • [23] Deep Learning Approaches Based on Transformer Architectures for Image Captioning Tasks
    Castro, Roberto
    Pineda, Israel
    Lim, Wansu
    Morocho-Cayamcela, Manuel Eugenio
    [J]. IEEE ACCESS, 2022, 10 : 33679 - 33694
  • [24] Complementary Shifted Transformer for Image Captioning
    Liu, Yanbo
    Yang, You
    Xiang, Ruoyu
    Ma, Jixin
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (06) : 8339 - 8363
  • [25] Reinforced Transformer for Medical Image Captioning
    Xiong, Yuxuan
    Du, Bo
    Yan, Pingkun
    [J]. MACHINE LEARNING IN MEDICAL IMAGING (MLMI 2019), 2019, 11861 : 673 - 680
  • [26] ReFormer: The Relational Transformer for Image Captioning
    Yang, Xuewen
    Liu, Yingru
    Wang, Xin
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5398 - 5406
  • [27] Transformer with a Parallel Decoder for Image Captioning
    Wei, Peilang
    Liu, Xu
    Luo, Jun
    Pu, Huayan
    Huang, Xiaoxu
    Wang, Shilong
    Cao, Huajun
    Yang, Shouhong
    Zhuang, Xu
    Wang, Jason
    Yue, Hong
    Ji, Cheng
    Zhou, Mingliang
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [28] Image captioning with transformer and knowledge graph
    Zhang, Yu
    Shi, Xinyu
    Mi, Siya
    Yang, Xu
    [J]. PATTERN RECOGNITION LETTERS, 2021, 143 : 43 - 49
  • [29] Complementary Shifted Transformer for Image Captioning
    Yanbo Liu
    You Yang
    Ruoyu Xiang
    Jixin Ma
    [J]. Neural Processing Letters, 2023, 55 : 8339 - 8363
  • [30] ETransCap: efficient transformer for image captioning
    Mundu, Albert
    Singh, Satish Kumar
    Dubey, Shiv Ram
    [J]. APPLIED INTELLIGENCE, 2024, 54 (21) : 10748 - 10762