Video Paragraph Captioning as a Text Summarization Task

被引:0
|
作者
Liu, Hui [1 ]
Wan, Xiaojun
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video paragraph captioning aims to generate a set of coherent sentences to describe a video that contains several events. Most previous methods simplify this task by using ground-truth event segments. In this work, we propose a novel framework by taking this task as a text summarization task. We first generate lots of sentence-level captions focusing on different video clips and then summarize these captions to obtain the final paragraph caption. Our method does not depend on ground-truth event segments. Experiments on two popular datasets ActivityNet Captions and YouCookII demonstrate the advantages of our new framework. On the ActivityNet dataset, our method even outperforms some previous methods using ground-truth event segment labels.
引用
收藏
页码:55 / 60
页数:6
相关论文
共 50 条
  • [31] A Proposed Methodology for Subjective Evaluation of Video and Text Summarization
    Garcia-Zapirain, Begona
    Castillo, Cristian
    Badiola, Aritz
    Zahia, Sofia
    Mendez, Amaia
    Langlois, David
    Jouvet, Denis
    Torres, Juan-Manuel
    Leszczuk, Mikolaj
    Smaili, Kamel
    MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, 2019, 833 : 396 - 404
  • [32] Learning Text-to-Video Retrieval from Image Captioning
    Lucas Ventura
    Cordelia Schmid
    Gül Varol
    International Journal of Computer Vision, 2025, 133 (4) : 1834 - 1854
  • [33] Object Relation Attention for Image Paragraph Captioning
    Yang, Li-Chuan
    Yang, Chih-Yuan
    Hsu, Jane Yung-jen
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3136 - 3144
  • [34] Towards Diverse Paragraph Captioning for Untrimmed Videos
    Song, Yuqing
    Chen, Shizhe
    Jin, Qin
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11240 - 11249
  • [35] Effective Multimodal Encoding for Image Paragraph Captioning
    Nguyen, Thanh-Son
    Fernando, Basura
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6381 - 6395
  • [36] Meta Learning for Task-Driven Video Summarization
    Li, Xuelong
    Li, Hongli
    Dong, Yongsheng
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2020, 67 (07) : 5778 - 5786
  • [37] Multi-Task Video Captioning with a Stepwise Multimodal Encoder
    Liu, Zihao
    Wu, Xiaoyu
    Yu, Ying
    ELECTRONICS, 2022, 11 (17)
  • [38] Using Topic in Summarization for Vietnamese Paragraph
    Dieu, Dat Tien
    Dinh, Dien
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 735 - 740
  • [39] Modeling coherence and diversity for image paragraph captioning
    He, Xiangheng
    Li, Xinde
    2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2020), 2020, : 634 - 639
  • [40] A Multi-Task Learning Framework for Abstractive Text Summarization
    Lu, Yao
    Liu, Linqing
    Jiang, Zhile
    Yang, Min
    Goebel, Randy
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9987 - 9988