Guide and interact: scene-graph based generation and control of video captions

被引：0

作者：

Xuyang Lu

Yang Gao

机构：

[1] Beijing Institute of Technology,The School of Computer Science and Technology

[2] Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications,undefined

来源：

Multimedia Systems | 2023年 / 29卷

关键词：

Video captioning; Scene graph; Multi-modal; Text generation;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Internet videos contain abounding meaningful information. The task of video captioning is to extract and understand video contents from video, and summarize them into a comprehensive description including one or multiple sentences. The research of video captioning involves challenges from both video understanding and natural language generation area. Among the technical obstacles confronted with video captioning, one of the most critical issue undermining the quality of video captioning is that the model tends to generate fictional contents, which is usually called “hallucination” problem. In this paper, we present scene-graph guidance and interaction (SGI) to solve this problem. The framework of SGI is composed of a faithful scene graph generation module and a multi-modal interactive network module. The scene graph generation module extracts a faithful scene graph from video, which is then regarded as the factual guidance for the text generator. The network module attends and interacts the video features and scene graph input, and generates a video caption including the faithful video contents. On this basis, we further explore our SGI model to realize user intention-based controllable video captioning using elaborate scene graphs. We performed experiments on Charades and ActivityNet Captions datasets, the SGI model achieved state-of-the-art performance by automatic metrics, proving the high quality and outstanding controllability of video captions.

引用

页码：797 / 809

页数：12

共 50 条

[41] Prediction and Generation of 3D Functional Scene Based on Relation Graph
Sun Q.
Hu R.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2022, 34 (09): : 1351 - 1361
[42] IndVisSGG: VLM-based scene graph generation for industrial spatial intelligence
Wang, Zuoxu
Yan, Zhijie
Li, Shufei
Liu, Jihong
ADVANCED ENGINEERING INFORMATICS, 2025, 65
[43] Remote sensing scene graph generation for improved retrieval based on spatial relationships
Tang, Jiayi
Tong, Xiaochong
Qiu, Chunping
Sun, Yuekun
Song, Haoshuai
Lei, Yaxian
Lei, Yi
Guo, Congzhou
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2025, 220 : 741 - 752
[44] PPDL: Predicate Probability Distribution based Loss for Unbiased Scene Graph Generation
Li, Wei
Zhang, Haiwei
Bai, Qijie
Zhao, Guoqing
Jiang, Ning
Yuan, Xiaojie
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19425 - 19434
[45] Automatic Question Generation based on MOOC Video Subtitles and Knowledge Graph
Ma, Lin
Ma, Yuchun
PROCEEDINGS OF 2019 7TH INTERNATIONAL CONFERENCE ON INFORMATION AND EDUCATION TECHNOLOGY (ICIET 2019), 2019, : 49 - 53
[46] SGFormer: Semantic Graph Transformer for Point Cloud-Based 3D Scene Graph Generation
Lv, Changsheng
Qi, Mengshi
Li, Xia
Yang, Zhengyuan
Ma, Huadong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4035 - 4043
[47] GPS-based route graph generation for over-the-horizon scene analysis
Kamejima, K. (kamejima@is.oit.ac.jp), 1600, ICIC Express Letters Office, Tokai University, Kumamoto Campus, 9-1-1, Toroku, Kumamoto, 862-8652, Japan (07):
[48] Towards Open-Vocabulary Scene Graph Generation with Prompt-Based Finetuning
He, Tao
Gao, Lianli
Song, Jingkuan
Li, Yuan-Fang
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 56 - 73
[49] Multi-Prototype Space Learning for Commonsense-Based Scene Graph Generation
Chen, Lianggangxu
Song, Youqi
Cai, Yiqing
Lu, Jiale
Li, Yang
Xie, Yuan
Wang, Changbo
He, Gaoqi
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1129 - 1137
[50] Video Scene Title Generation based on Explicit and Implicit Relations among Caption Words
Son, Jeong-Woo
Park, Wonjoo
Lee, Sang-Yun
Kim, Sun-Joong
2018 20TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2018, : 571 - 573

← 1 2 3 4 5 →