Video Paragraph Captioning as a Text Summarization Task

被引：0

作者：

Liu, Hui ^{[1
]}

Wan, Xiaojun

机构：

[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China

来源：

ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2 | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video paragraph captioning aims to generate a set of coherent sentences to describe a video that contains several events. Most previous methods simplify this task by using ground-truth event segments. In this work, we propose a novel framework by taking this task as a text summarization task. We first generate lots of sentence-level captions focusing on different video clips and then summarize these captions to obtain the final paragraph caption. Our method does not depend on ground-truth event segments. Experiments on two popular datasets ActivityNet Captions and YouCookII demonstrate the advantages of our new framework. On the ActivityNet dataset, our method even outperforms some previous methods using ground-truth event segment labels.

引用

页码：55 / 60

页数：6

共 50 条

[1] Text Embedding Bank for Detailed Image Paragraph Captioning
Gupta, Arjun
Shen, Zengming
Huang, Thomas
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15791 - 15792
[2] Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
Yu, Haonan
Wang, Jiang
Huang, Zhiheng
Yang, Yi
Xu, Wei
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4584 - 4593
[3] Learning topic emotion and logical semantic for video paragraph captioning
Li, Qinyu
Wang, Hanli
Yi, Xiaokai
DISPLAYS, 2024, 83
[4] Enhanced-Memory Transformer for Coherent Paragraph Video Captioning
Cardoso, Leonardo Vilela
Guimaraes, Silvio Jamil F.
Patrocinio Jr, Zenilton K. G.
2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 836 - 840
[5] Memory-enhanced hierarchical transformer for video paragraph captioning
Zhang, Benhui
Gao, Junyu
Yuan, Yuan
Neurocomputing, 2025, 615
[6] Incorporating the Graph Representation of Video and Text into Video Captioning
Lu, Min
Li, Yuan
2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 396 - 401
[7] Improving Automatic Image Captioning Using Text Summarization Techniques
Plaza, Laura
Lloret, Elena
Aker, Ahmet
TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 165 - +
[8] Video summarization and captioning using dynamic mode decomposition for surveillance
Radarapu R.
Gopal A.S.S.
Nh M.
Anand Kumar M.
International Journal of Information Technology, 2021, 13 (5) : 1927 - 1936
[9] Video captioning with global and local text attention
Peng, Yuqing
Wang, Chenxi
Pei, Yixin
Li, Yingjun
VISUAL COMPUTER, 2022, 38 (12): : 4267 - 4278
[10] Visual to Text: Survey of Image and Video Captioning
Li, Sheng
Tao, Zhiqiang
Li, Kang
Fu, Yun
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2019, 3 (04): : 297 - 312

← 1 2 3 4 5 →