Guide and interact: scene-graph based generation and control of video captions

被引:0
|
作者
Xuyang Lu
Yang Gao
机构
[1] Beijing Institute of Technology,The School of Computer Science and Technology
[2] Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications,undefined
来源
Multimedia Systems | 2023年 / 29卷
关键词
Video captioning; Scene graph; Multi-modal; Text generation;
D O I
暂无
中图分类号
学科分类号
摘要
Internet videos contain abounding meaningful information. The task of video captioning is to extract and understand video contents from video, and summarize them into a comprehensive description including one or multiple sentences. The research of video captioning involves challenges from both video understanding and natural language generation area. Among the technical obstacles confronted with video captioning, one of the most critical issue undermining the quality of video captioning is that the model tends to generate fictional contents, which is usually called “hallucination” problem. In this paper, we present scene-graph guidance and interaction (SGI) to solve this problem. The framework of SGI is composed of a faithful scene graph generation module and a multi-modal interactive network module. The scene graph generation module extracts a faithful scene graph from video, which is then regarded as the factual guidance for the text generator. The network module attends and interacts the video features and scene graph input, and generates a video caption including the faithful video contents. On this basis, we further explore our SGI model to realize user intention-based controllable video captioning using elaborate scene graphs. We performed experiments on Charades and ActivityNet Captions datasets, the SGI model achieved state-of-the-art performance by automatic metrics, proving the high quality and outstanding controllability of video captions.
引用
收藏
页码:797 / 809
页数:12
相关论文
共 50 条
  • [1] Guide and interact: scene-graph based generation and control of video captions
    Lu, Xuyang
    Gao, Yang
    MULTIMEDIA SYSTEMS, 2023, 29 (02) : 797 - 809
  • [2] Image Captioning with Scene-graph Based Semantic Concepts
    Gao, Lizhao
    Wang, Bo
    Wang, Wenmin
    PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING (ICMLC 2018), 2018, : 225 - 229
  • [3] Enriching Scene-Graph Generation with Prior Knowledge from Work Instruction
    Jesko, Zoltan
    Tuan-Anh
    Halaszl, Gergely
    Abonyi, Janos
    Ruppert, Minas
    ADVANCES IN PRODUCTION MANAGEMENT SYSTEMS-PRODUCTION MANAGEMENT SYSTEMS FOR VOLATILE, UNCERTAIN, COMPLEX, AND AMBIGUOUS ENVIRONMENTS, PT II, APMS 2024, 2024, 729 : 290 - 302
  • [4] Scene Graph Generation from Objects, Phrases and Region Captions
    Li, Yikang
    Ouyang, Wanli
    Zhou, Bolei
    Wang, Kun
    Wang, Xiaogang
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1270 - 1279
  • [5] Image-Collection Summarization Using Scene-Graph Generation With External Knowledge
    Phueaksri, Itthisak
    Kastner, Marc A.
    Kawanishi, Yasutomo
    Komamizu, Takahiro
    Ide, Ichiro
    IEEE ACCESS, 2024, 12 : 17499 - 17512
  • [6] Panoptic Video Scene Graph Generation
    Yang, Jingkang
    Peng, Wenxuan
    Li, Xiangtai
    Guo, Zujin
    Chen, Liangyu
    Li, Bo
    Ma, Zheng
    Zhou, Kaiyang
    Zhang, Wayne
    Loy, Chen Change
    Liu, Ziwei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18675 - 18685
  • [7] HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
    Trong-Thuan Nguyen
    Pha Nguyen
    Luu, Khoa
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 18384 - 18394
  • [8] SceneGATE: Scene-Graph Based Co-Attention Networks for Text Visual Question Answering
    Cao, Feiqi
    Luo, Siwen
    Nunez, Felipe
    Wen, Zean
    Poon, Josiah
    Han, Soyeon Caren
    ROBOTICS, 2023, 12 (04)
  • [9] Target Adaptive Context Aggregation for Video Scene Graph Generation
    Teng, Yao
    Wang, Limin
    Li, Zhifeng
    Wu, Gangshan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13668 - 13677
  • [10] Video Scene Graph Generation with Spatial-Temporal Knowledge
    Pu, Tao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9340 - 9344