Hierarchical Attention-Based Fusion for Image Caption With Multi-Grained Rewards

被引:7
|
作者
Wu, Chunlei [1 ]
Yuan, Shaozu [1 ]
Cao, Haiwen [1 ]
Wei, Yiwei [2 ]
Wang, Leiquan [1 ]
机构
[1] China Univ Petr, Coll Comp Sci & Technol, Qingdao 266580, Peoples R China
[2] China Univ Petr Beijing Karamay, Sch Petr Engn, Karamay 834000, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
基金
中国国家自然科学基金;
关键词
Image caption; reforcement learning; attention mechanism;
D O I
10.1109/ACCESS.2020.2981513
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image caption based on reinforcement learning (RL) methods has achieved significant success recently. Most of these methods take CIDEr score as the reward of reinforcement learning algorithm to compute gradients, thus refining the image caption baseline model. However, CIDEr score is not the sole criterion to judge the quality of a generated caption. In this paper, a Hierarchical Attention Fusion (HAF) model is presented as a baseline for image caption based on RL, where multi-level feature maps of Resnet are integrated with hierarchical attention. Revaluation network (REN) is exploited for revaluating CIDEr score by assigning different weights for each word according to the importance of each word in a generating caption. The weighted reward can be regarded as word-level reward. Moreover, Scoring Network (SN) is implemented to score the generating sentence with its corresponding ground truth from a batch of captions. This reward can obtain benefits from additional unmatched ground truth, which acts as sentence-level reward. Experimental results on the COCO dataset show that the proposed methods have achieved competitive performance compared with the related image caption methods.
引用
收藏
页码:57943 / 57951
页数:9
相关论文
共 50 条
  • [21] Attention-based Fusion Network for Image Forgery Localization
    Gong, Wenhui
    Chen, Yan
    Alam, Mohammad S.
    Sang, Jun
    [J]. PATTERN RECOGNITION AND PREDICTION XXXV, 2024, 13040
  • [22] Attention-based for Multiscale Fusion Underwater Image Enhancement
    Huang, Zhixiong
    Li, Jinjiang
    Hua, Zhen
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (02): : 544 - 564
  • [23] Attention-Based Medical Caption Generation with Image Modality Classification and Clinical Concept Mapping
    Hasan, Sadid A.
    Ling, Yuan
    Liu, Joey
    Sreenivasan, Rithesh
    Anand, Shreya
    Arora, Tilak Raj
    Datla, Vivek
    Lee, Kathy
    Qadir, Ashequl
    Swisher, Christine
    Farri, Oladimeji
    [J]. EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION (CLEF 2018), 2018, 11018 : 224 - 230
  • [24] Multi-grained Aspect Fusion for Review Response Generation
    Yuan, Yun
    Gong, Chen
    Kong, Dexin
    Yu, Nan
    Fu, Guohong
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IX, 2023, 14262 : 25 - 37
  • [25] Transformer Based Multi-Grained Attention Network for Aspect-Based Sentiment Analysis
    Sun, Jiahui
    Han, Ping
    Cheng, Zheng
    Wu, Enming
    Wang, Wenqing
    [J]. IEEE ACCESS, 2020, 8 : 211152 - 211163
  • [26] Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis
    Hono, Yukiya
    Tsuboi, Kazuna
    Sawada, Kei
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. INTERSPEECH 2020, 2020, : 3441 - 3445
  • [27] A Hierarchical Multimodal Attention-based Neural Network for Image Captioning
    Cheng, Yong
    Huang, Fei
    Zhou, Lian
    Jin, Cheng
    Zhang, Yuejie
    Zhang, Tao
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 889 - 892
  • [28] Multi-Grained Temporal Segmentation Attention Modeling for Skeleton-Based Action Recognition
    Lv, Jinrong
    Gong, Xun
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 927 - 931
  • [29] Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition
    Liu, Xiaodong
    Li, Songyang
    Wang, Miao
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [30] Fine-Grained Image Quality Caption With Hierarchical Semantics Degradation
    Yang, Wen
    Wu, Jinjian
    Tian, Shiwei
    Li, Leida
    Dong, Weisheng
    Shi, Guangming
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3578 - 3590