Video Summarization Leveraging Multimodal Information for Presentations

被引：0

作者：

Liu, Hanchao ^{[1
]}

Chen, Dapeng ^{[1
]}

Li, Rongjun ^{[1
]}

Xue, Wenyuan ^{[1
]}

Peng, Wei ^{[1
]}

机构：

[1] Huawei Technol Co Ltd, IT Platform Chief Expert Off, Shenzhen, Peoples R China

来源：

INTERSPEECH 2023 | 2023年

关键词：

multimodal; video summarization;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This demonstration introduces a video summarization system, leveraging multimodal information to efficiently extract essential contents from presentations. In contrast to existing methods focusing primarily on daily life videos and solely utilizing visual information, our system extracts multimodal information, including speech, text, and visual information from videos of presentations. Specifically, the proposed system extracts crucial slide texts from key-frames as queries to filter speech transcripts. By piecing together the video clips corresponding to the filtered speech transcripts, our system outputs the final video summarizations. The evaluation on ICCV 2017 videos demonstrates the effectiveness of the proposed system compared with the lead-3 baseline.

引用

页码：5251 / 5252

页数：2

共 50 条

[31] Multimodal emotional analysis through hierarchical video summarization and face tracking
Thiruthuvanathan, Michael Moses
Krishnan, Balachandran
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (25) : 35535 - 35554
[32] Video summarization via knowledge-aware multimodal deep networks
Xie, Jiehang
Chen, Xuanbai
Zhao, Sicheng
Lu, Shao-Ping
KNOWLEDGE-BASED SYSTEMS, 2024, 293
[33] Meeting Extracts for Discussion Summarization Based on Multimodal Nonverbal Information
Nihei, Fumio
Nakano, Yukiko I.
Takase, Yutaka
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 185 - 192
[34] Automatic video summarization by using color and utterance information
Fujimura, K
Honda, K
Uehara, K
IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : 49 - 52
[35] Video Summarization Using Genetic Algorithm and Information Theory
Tabrizi, Zeinab Zeinalpour
Bidgoli, Behrouz Minaei
Fathi, Mahmud
2009 14TH INTERNATIONAL COMPUTER CONFERENCE, 2009, : 157 - 162
[36] LEVERAGING LOCAL TEMPORAL INFORMATION FOR MULTIMODAL SCENE CLASSIFICATION
Sahu, Saurabh
Goyal, Palash
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1830 - 1834
[37] MSMO: Multimodal Summarization with Multimodal Output
Zhu, Junnan
Li, Haoran
Liu, Tianshang
Zhou, Yu
Zhang, Jiajun
Zong, Chengqing
2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4154 - 4164
[38] Multimodal Summarization with Guidance of Multimodal Reference
Zhu, Junnan
Zhou, Yu
Zhang, Jiajun
Li, Haoran
Zong, Chengqing
Li, Changliang
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9749 - 9756
[39] Information Graphic Summarization using a Collection of Multimodal Deep Neural Networks
Kim, Edward
Onweller, Connor
McCoy, Kathleen E.
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10188 - 10195
[40] An Efficient Method for Video Summarization using Moving Object Information
Salehin, Md. Musfequs
Paul, Manoranjan
2015 18TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2015, : 237 - 242

← 1 2 3 4 5 →