Emotion Aware Reinforcement Network for Visual Storytelling

被引:0
|
作者
Li, Xin [1 ]
Cai, Hanqing [1 ]
Jiang, Tianling [1 ]
Liu, Chunping [1 ]
Ji, Yi [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual storytelling; Attention mechanism; Reinforcement learning;
D O I
10.1007/978-3-031-15931-2_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual storytelling is the task of generating a sequence of human-like sentences (i.e. story) for an ordered stream of images. Unlike traditional image captioning, the story contains not only factual descriptions but also concepts and objects that do not explicitly appear in the input images. Recent works utilize either end-to-end or multi-stage frameworks to produce more relevant and coherent stories but usually ignore latent emotional information. In this work, to generate an affective story, we propose an Emotion Aware Reinforcement Network for VIsual StoryTelling (EARN-VIST). Specifically in our network, lexicon-based attention is leveraged to encourage the model to pay more attention to the emotional words. Then we apply two emotional consistency reinforcement learning rewards using an emotion classifier and commonsense transformer respectively to find the gap between generated story and human-labeled story so as to refine the generation process. Experimental results on the VIST dataset and human evaluation demonstrate that our model outperforms most of the cutting-edge models across multiple evaluation metrics.
引用
收藏
页码:26 / 37
页数:12
相关论文
共 50 条
  • [21] Emotion and Narrative: Perspectives in Autobiographical Storytelling
    Makela, Petra
    EMOTIONS AND SOCIETY, 2020, 2 (01): : 109 - 111
  • [22] Changing emotion: The use of therapeutic storytelling
    Parker, TS
    Wampler, KS
    JOURNAL OF MARITAL AND FAMILY THERAPY, 2006, 32 (02) : 155 - 166
  • [23] Emotion and narrative: Perspectives in autobiographical storytelling
    Randall, William
    BRITISH JOURNAL OF PSYCHOLOGY, 2020, 111 (01) : 152 - 154
  • [24] S2-aware network for visual recognition
    Zhao, Wenyi
    Yang, Huihua
    Pan, Xipeng
    Li, Lingqiao
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 99
  • [25] SANet: Structure-Aware Network for Visual Tracking
    Fan, Heng
    Ling, Haibin
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 2217 - 2224
  • [26] Beyond visual cues: Emotion recognition in images with text-aware fusion☆
    Sungur, Kerim Serdar
    Bakal, Gokhan
    DISPLAYS, 2025, 87
  • [27] Context-Aware Attention Network for Human Emotion Recognition in Video
    Liu, Xiaodong
    Wang, Miao
    ADVANCES IN MULTIMEDIA, 2020, 2020
  • [28] Sparse temporal aware capsule network for robust speech emotion recognition
    Zhang, Huiyun
    Huang, Heming
    Zhao, Puyang
    Yu, Zhenbao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 144
  • [29] Sequential Interactive Biased Network for Context-Aware Emotion Recognition
    Li, Xinpeng
    Peng, Xiaojiang
    Ding, Changxing
    2021 INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2021), 2021,