Video Paragraph Captioning as a Text Summarization Task

被引：0

作者：

Liu, Hui ^{[1
]}

Wan, Xiaojun

机构：

[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China

来源：

ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2 | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video paragraph captioning aims to generate a set of coherent sentences to describe a video that contains several events. Most previous methods simplify this task by using ground-truth event segments. In this work, we propose a novel framework by taking this task as a text summarization task. We first generate lots of sentence-level captions focusing on different video clips and then summarize these captions to obtain the final paragraph caption. Our method does not depend on ground-truth event segments. Experiments on two popular datasets ActivityNet Captions and YouCookII demonstrate the advantages of our new framework. On the ActivityNet dataset, our method even outperforms some previous methods using ground-truth event segment labels.

引用

页码：55 / 60

页数：6

共 50 条

[31] A Proposed Methodology for Subjective Evaluation of Video and Text Summarization
Garcia-Zapirain, Begona
Castillo, Cristian
Badiola, Aritz
Zahia, Sofia
Mendez, Amaia
Langlois, David
Jouvet, Denis
Torres, Juan-Manuel
Leszczuk, Mikolaj
Smaili, Kamel
MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, 2019, 833 : 396 - 404
[32] Learning Text-to-Video Retrieval from Image Captioning
Lucas Ventura
Cordelia Schmid
Gül Varol
International Journal of Computer Vision, 2025, 133 (4) : 1834 - 1854
[33] Object Relation Attention for Image Paragraph Captioning
Yang, Li-Chuan
Yang, Chih-Yuan
Hsu, Jane Yung-jen
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3136 - 3144
[34] Towards Diverse Paragraph Captioning for Untrimmed Videos
Song, Yuqing
Chen, Shizhe
Jin, Qin
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11240 - 11249
[35] Effective Multimodal Encoding for Image Paragraph Captioning
Nguyen, Thanh-Son
Fernando, Basura
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6381 - 6395
[36] Meta Learning for Task-Driven Video Summarization
Li, Xuelong
Li, Hongli
Dong, Yongsheng
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2020, 67 (07) : 5778 - 5786
[37] Multi-Task Video Captioning with a Stepwise Multimodal Encoder
Liu, Zihao
Wu, Xiaoyu
Yu, Ying
ELECTRONICS, 2022, 11 (17)
[38] Using Topic in Summarization for Vietnamese Paragraph
Dieu, Dat Tien
Dinh, Dien
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 735 - 740
[39] Modeling coherence and diversity for image paragraph captioning
He, Xiangheng
Li, Xinde
2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2020), 2020, : 634 - 639
[40] A Multi-Task Learning Framework for Abstractive Text Summarization
Lu, Yao
Liu, Linqing
Jiang, Zhile
Yang, Min
Goebel, Randy
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9987 - 9988

← 1 2 3 4 5 →