Emotion Aware Reinforcement Network for Visual Storytelling

被引:0
|
作者
Li, Xin [1 ]
Cai, Hanqing [1 ]
Jiang, Tianling [1 ]
Liu, Chunping [1 ]
Ji, Yi [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual storytelling; Attention mechanism; Reinforcement learning;
D O I
10.1007/978-3-031-15931-2_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual storytelling is the task of generating a sequence of human-like sentences (i.e. story) for an ordered stream of images. Unlike traditional image captioning, the story contains not only factual descriptions but also concepts and objects that do not explicitly appear in the input images. Recent works utilize either end-to-end or multi-stage frameworks to produce more relevant and coherent stories but usually ignore latent emotional information. In this work, to generate an affective story, we propose an Emotion Aware Reinforcement Network for VIsual StoryTelling (EARN-VIST). Specifically in our network, lexicon-based attention is leveraged to encourage the model to pay more attention to the emotional words. Then we apply two emotional consistency reinforcement learning rewards using an emotion classifier and commonsense transformer respectively to find the gap between generated story and human-labeled story so as to refine the generation process. Experimental results on the VIST dataset and human evaluation demonstrate that our model outperforms most of the cutting-edge models across multiple evaluation metrics.
引用
收藏
页码:26 / 37
页数:12
相关论文
共 50 条
  • [41] Context-Aware Based Visual-Audio Feature Fusion for Emotion Recognition
    Cheng, Huijie
    Tie, Yun
    Qi, Lin
    Jin, Cong
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [42] VISUAL STORYTELLING IN STREET PHOTOGRAPHY
    Isik, Atila
    ANADOLU UNIVERSITESI SANAT & TASARIM DERGISI-ANADOLU UNIVERSITY JOURNAL OF ART & DESIGN, 2023, 13 (02): : 511 - 525
  • [43] Visual storytelling studio launched
    不详
    VETERINARY RECORD, 2024, 194 (11) : 420 - 420
  • [44] Emotion-Driven Interactive Digital Storytelling
    Zhao, Huiwen
    Zhang, Jian J.
    McDougall, Sine
    ENTERTAINMENT COMPUTING - ICEC 2011, 2011, 6972 : 22 - +
  • [45] A multichannel location-aware interaction network for visual classification
    Zhu, Qiangxi
    Li, Zhixin
    Kuang, Wenlan
    Ma, Huifang
    APPLIED INTELLIGENCE, 2023, 53 (20) : 23049 - 23066
  • [46] Distractor-Aware Visual Tracking by Online Siamese Network
    Zha, Yufei
    Wu, Min
    Qiu, Zhuling
    Dong, Shuangyu
    Yang, Fei
    Zhang, Peng
    IEEE ACCESS, 2019, 7 : 89777 - 89788
  • [47] Stratified Rule-Aware Network for Abstract Visual Reasoning
    Hu, Sheng
    Ma, Yuqing
    Liu, Xianglong
    Wei, Yanlu
    Bai, Shihao
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1567 - 1574
  • [48] Multi-aware coreference relation network for visual dialog
    Zefan Zhang
    Tianling Jiang
    Chunping Liu
    Yi Ji
    International Journal of Multimedia Information Retrieval, 2022, 11 : 567 - 576
  • [49] SiamDA: distribution-aware Siamese network for visual tracking
    Ji, Qiuhan
    Shi, Hongbo
    Tan, Shuai
    Song, Bing
    Tao, Yang
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (06)
  • [50] A multichannel location-aware interaction network for visual classification
    Qiangxi Zhu
    Zhixin Li
    Wenlan Kuang
    Huifang Ma
    Applied Intelligence, 2023, 53 : 23049 - 23066