Video Summarization Leveraging Multimodal Information for Presentations

被引：0

作者：

Liu, Hanchao ^{[1
]}

Chen, Dapeng ^{[1
]}

Li, Rongjun ^{[1
]}

Xue, Wenyuan ^{[1
]}

Peng, Wei ^{[1
]}

机构：

[1] Huawei Technol Co Ltd, IT Platform Chief Expert Off, Shenzhen, Peoples R China

来源：

INTERSPEECH 2023 | 2023年

关键词：

multimodal; video summarization;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This demonstration introduces a video summarization system, leveraging multimodal information to efficiently extract essential contents from presentations. In contrast to existing methods focusing primarily on daily life videos and solely utilizing visual information, our system extracts multimodal information, including speech, text, and visual information from videos of presentations. Specifically, the proposed system extracts crucial slide texts from key-frames as queries to filter speech transcripts. By piecing together the video clips corresponding to the filtered speech transcripts, our system outputs the final video summarizations. The evaluation on ICCV 2017 videos demonstrates the effectiveness of the proposed system compared with the lead-3 baseline.

引用

页码：5251 / 5252

页数：2

共 50 条

[41] On Multimodal Microblog Summarization
Saini, Naveen
Saha, Sriparna
Bhattacharyya, Pushpak
Mrinal, Shubhankar
Mishra, Santosh Kumar
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2022, 9 (05) : 1317 - 1329
[42] Multimodal information fusion for video concept detection
Wu, Y
Lin, CK
Chang, EY
Smith, JR
ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 2391 - 2394
[43] Multimodal Information Fusion for Semantic Video Analysis
Gulen, Elvan
Yilmaz, Turgay
Yazici, Adnan
INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2012, 3 (04): : 52 - 74
[44] Perceptual Video Summarization-A New Framework for Video Summarization
Thomas, Sinnu Susan
Gupta, Sumana
Subramanian, Venkatesh K.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (08) : 1790 - 1802
[45] COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization
Athanasia Zlatintsi
Petros Koutras
Georgios Evangelopoulos
Nikolaos Malandrakis
Niki Efthymiou
Katerina Pastra
Alexandros Potamianos
Petros Maragos
EURASIP Journal on Image and Video Processing, 2017
[46] COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization
Zlatintsi, Athanasia
Koutras, Petros
Evangelopoulos, Georgios
Malandrakis, Nikolaos
Efthymiou, Niki
Pastra, Katerina
Potamianos, Alexandros
Maragos, Petros
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2017,
[47] Leveraging ensemble machine learning and multimodal video complexity for better prediction of video difficulty in second language
Alghamdi, Emad A.
INTERACTIVE LEARNING ENVIRONMENTS, 2024,
[48] Exploring the Trade-Off within Visual Information for MultiModal Sentence Summarization
Yuan, Minghuan
Cui, Shiyao
Zhang, Xinghua
Wang, Shicheng
Xu, Hongbo
Liu, Tingwen
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2006 - 2017
[49] Semantic Representation and Attention Alignment for Graph Information Bottleneck in Video Summarization
Zhong, Rui
Wang, Rui
Yao, Wenjin
Hu, Min
Dong, Shi
Munteanu, Adrian
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4170 - 4184
[50] Robust shot boundary detection and video summarization based on motion information
Zhang J.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2010, 22 (06): : 1023 - 1032

← 1 2 3 4 5 →