Guide and interact: scene-graph based generation and control of video captions

被引:0
|
作者
Xuyang Lu
Yang Gao
机构
[1] Beijing Institute of Technology,The School of Computer Science and Technology
[2] Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications,undefined
来源
Multimedia Systems | 2023年 / 29卷
关键词
Video captioning; Scene graph; Multi-modal; Text generation;
D O I
暂无
中图分类号
学科分类号
摘要
Internet videos contain abounding meaningful information. The task of video captioning is to extract and understand video contents from video, and summarize them into a comprehensive description including one or multiple sentences. The research of video captioning involves challenges from both video understanding and natural language generation area. Among the technical obstacles confronted with video captioning, one of the most critical issue undermining the quality of video captioning is that the model tends to generate fictional contents, which is usually called “hallucination” problem. In this paper, we present scene-graph guidance and interaction (SGI) to solve this problem. The framework of SGI is composed of a faithful scene graph generation module and a multi-modal interactive network module. The scene graph generation module extracts a faithful scene graph from video, which is then regarded as the factual guidance for the text generator. The network module attends and interacts the video features and scene graph input, and generates a video caption including the faithful video contents. On this basis, we further explore our SGI model to realize user intention-based controllable video captioning using elaborate scene graphs. We performed experiments on Charades and ActivityNet Captions datasets, the SGI model achieved state-of-the-art performance by automatic metrics, proving the high quality and outstanding controllability of video captions.
引用
收藏
页码:797 / 809
页数:12
相关论文
共 50 条
  • [21] Spatial–Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation
    Pu, Tao
    Chen, Tianshui
    Wu, Hefeng
    Lu, Yongyi
    Lin, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 556 - 568
  • [22] End-to-End Video Scene Graph Generation With Temporal Propagation Transformer
    Zhang, Yong
    Pan, Yingwei
    Yao, Ting
    Huang, Rui
    Mei, Tao
    Chen, Chang-Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1613 - 1625
  • [23] Video-based spatio-temporal scene graph generation with efficient self-supervision tasks
    Lianggangxu Chen
    Yiqing Cai
    Changhong Lu
    Changbo Wang
    Gaoqi He
    Multimedia Tools and Applications, 2023, 82 : 38947 - 38966
  • [24] Video-based spatio-temporal scene graph generation with efficient self-supervision tasks
    Chen, Lianggangxu
    Cai, Yiqing
    Lu, Changhong
    Wang, Changbo
    He, Gaoqi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (25) : 38947 - 38966
  • [25] Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation
    Wang, Wenqing
    Gao, Kaifeng
    Luo, Yawei
    Jiang, Tao
    Gao, Fei
    Shao, Jian
    Sun, Jianwen
    Xiao, Jun
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5153 - 5163
  • [26] Prototype-based Embedding Network for Scene Graph Generation
    Zheng, Chaofan
    Lyu, Xinyu
    Gao, Lianli
    Dai, Bo
    Song, Jingkuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22783 - 22792
  • [27] Scene Graph Generation Based on Shuffle Residual Context Information
    Lin X.
    Tian X.
    Ji Y.
    Xu Y.
    Liu C.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (08): : 1721 - 1730
  • [28] Video Summarization Generation Based on Graph Structure Reconstruction
    Zhang, Jing
    Wu, Guangli
    Song, Shanshan
    ELECTRONICS, 2023, 12 (23)
  • [29] Adaptive Image-to-Video Scene Graph Generation via Knowledge Reasoning and Adversarial Learning
    Chen, Jin
    Ji, Xiaofeng
    Wu, Xinxiao
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 276 - 284
  • [30] Knowledge-Based Scene Graph Generation with Visual Contextual Dependency
    Zhang, Lizong
    Yin, Haojun
    Hui, Bei
    Liu, Sijuan
    Zhang, Wei
    MATHEMATICS, 2022, 10 (14)