Multimodal Abstractive Summarization for How2 Videos

被引:0
|
作者
Palaskar, Shruti [1 ]
Libovicky, Jindrich [2 ]
Gella, Spandana [3 ,4 ]
Metze, Florian [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Charles Univ Prague, Fac Math & Phys, Prague, Czech Republic
[3] Amazon AI, Bellevue, WA USA
[4] Univ Edinburgh, Edinburgh, Midlothian, Scotland
关键词
PERFORMANCE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study abstractive summarization for open-domain videos. Unlike the traditional text news summarization, the goal is less to "compress" text information but rather to provide a fluent textual summary of information that has been collected and fused from different source modalities, in our case video and audio transcripts (or text). We show how a multi-source sequence-to-sequence model with hierarchical attention can integrate information from different modalities into a coherent output, compare various models trained with different modalities and present pilot experiments on the How2 corpus of instructional videos. We also propose a new evaluation metric (Content Fl) for abstractive summarization task that measures semantic adequacy rather than fluency of the summaries, which is covered by metrics like ROUGE and BLEU.
引用
收藏
页码:6587 / 6596
页数:10
相关论文
共 50 条
  • [1] ABSUM: ABstractive SUMmarization of Lecture Videos
    Devi, M. S. Karthika
    Bhuvaneshwari, R.
    Baskaran, R.
    [J]. SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 3, SMARTCOM 2024, 2024, 947 : 237 - 248
  • [2] Abstractive Text Summarization Using Multimodal Information
    Rafi, Shaik
    Das, Ranjita
    [J]. 2023 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE, ISCMI, 2023, : 141 - 145
  • [3] Topic-guided abstractive multimodal summarization with multimodal output
    Rafi, Shaik
    Das, Ranjita
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023,
  • [4] TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records
    Gigant, Theo
    Dufaux, Frederic
    Guinaudeau, Camille
    Decombas, Marc
    [J]. 20TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2023, 2023, : 61 - 70
  • [5] Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization
    Liang, Yunlong
    Meng, Fandong
    Xu, Jinan
    Wang, Jiaan
    Chen, Yufeng
    Zhou, Jie
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2934 - 2951
  • [6] ICAF: Iterative Contrastive Alignment Framework for Multimodal Abstractive Summarization
    Zhang, Zijian
    Shu, Chang
    Chen, Youxin
    Xiao, Jing
    Zhang, Qian
    Zheng, Lu
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [7] CTNR: Compress-then-Reconstruct Approach for Multimodal Abstractive Summarization
    Zhang, Chenxi
    Zhang, Zijian
    Li, Jiangfeng
    Liu, Qin
    Zhu, Hongming
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [8] TLDW: Extreme Multimodal Summarization of News Videos
    Tang, Peggy
    Hu, Kun
    Zhang, Lei
    Luo, Jiebo
    Wang, Zhiyong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1469 - 1480
  • [9] Multimodal Summarization of User-Generated Videos
    Psallidas, Theodoros
    Koromilas, Panagiotis
    Giannakopoulos, Theodoros
    Spyrou, Evaggelos
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (11):
  • [10] Abstractive Summarization with the Aid of Extractive Summarization
    Chen, Yangbin
    Ma, Yun
    Mao, Xudong
    Li, Qing
    [J]. WEB AND BIG DATA (APWEB-WAIM 2018), PT I, 2018, 10987 : 3 - 15