Multimodal-Based and Aesthetic-Guided Narrative Video Summarization

被引：5

作者：

Xie, Jiehang ^{[1
]}

Chen, Xuanbai ^{[1
]}

Zhang, Tianyi ^{[2
]}

Zhang, Yixuan ^{[1
]}

Lu, Shao-Ping ^{[1
]}

Cesar, Pablo ^{[2
]}

Yang, Yulu ^{[1
]}

机构：

[1] Nankai Univ, TKLNDST, CS, Nankai 300071, Peoples R China

[2] Ctr Wiskunde & Informat, NL-098 XG Amsterdam, Netherlands

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

关键词：

Narrative video summarization; multimodal information; aesthetic guidance;

D O I：

10.1109/TMM.2022.3183394

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Narrative videos usually illustrate the main content through multiple narrative information such as audios, video frames and subtitles. Existing video summarization approaches rarely consider the multiple dimensional narrative inputs, or ignore the impact of shots artistic assembly when directly applied to narrative videos. This paper introduces a multimodal-based and aesthetic-guided narrative video summarization method. Our method leverages multimodal information including visual content, subtitles and audio information through our specified key shots selection, subtitle summarization, and highlight extraction components. Furthermore, under the guidance of cinematographic aesthetic, we design a novel shots assembly module to ensure the shot content completeness and then assemble the selected shots into a desired summary. Besides, our method also provides the flexible specification for shots selection, to achieve which it automatically selects semantically related shots according to the user-designed text. By conducting a large number of quantitative experimental evaluations and user studies, we demonstrate that our method effectively preserves important narrative information of the original video, and it is capable of rapidly producing high-quality and aesthetic-guided narrative video summaries.

引用

页码：4894 / 4908

页数：15

共 50 条

[41] Smart Surveillance Based on Video Summarization
Thomas, Sinnu Susan
Gupta, Sumana
Subramanian, Venkatesh K.
[J]. 2017 IEEE REGION 10 INTERNATIONAL SYMPOSIUM ON TECHNOLOGIES FOR SMART CITIES (IEEE TENSYMP 2017), 2017,
[42] Human Based Surveillance Video Summarization
Aydemir, M. Said
Karsligil, M. Elif
[J]. 2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
[43] Gesture-based video summarization
Kosmopoulos, D
Doulamis, A
Doulamis, N
[J]. 2005 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), VOLS 1-5, 2005, : 3213 - 3216
[44] Video summarization based on semantic representation
Carlos, RP
Uehara, K
[J]. ADVANCED MULTIMEDIA CONTENT PROCESSING, 1999, 1554 : 1 - 16
[45] VIDEO SUMMARIZATION BASED ON LOCAL FEATURES
Massaoudi, Mohamed
Bahroun, Sahbi
Zagrouba, Ezzeddine
[J]. 25. INTERNATIONAL CONFERENCE IN CENTRAL EUROPE ON COMPUTER GRAPHICS, VISUALIZATION AND COMPUTER VISION (WSCG 2017), 2017, 2701 : 13 - 17
[46] Video Summarization Based on Optical Flow
Jadhav, Dipti
Bhosle, Udhav
[J]. ADVANCED COMPUTING AND INTELLIGENT ENGINEERING, 2020, 1082 : 333 - 342
[47] MMFG: Multimodal-based Mutual Feature Gating 3D Object Detection
Xu, Wanpeng
Fu, Zhipeng
[J]. JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2024, 110 (02)
[48] Multimodal-based shape optimization of a rectangular horn mounted in an enclosure for optimum impedance matching
Xiao, He
Dong, Hao
Lyu, Yuzhen
Feng, Xuelei
Shen, Yong
[J]. APPLIED ACOUSTICS, 2023, 214
[49] Efficient multimodal-based shape optimization of acoustic horns with application to subwavelength perfect transmission
Dong, Hao
Doc, Jean-Baptiste
Felix, Simon
[J]. JOURNAL OF SOUND AND VIBRATION, 2023, 559
[50] Interactive and Multimodal-based Augmented Reality for Remote Assistance using a Digital Surgical Microscope
Wisotzky, Eric L.
Rosenthal, Jean-Claude
Eisert, Peter
Hilsmann, Anna
Schmid, Falko
Bauer, Michael
Schneider, Armin
Uecker, Florian C.
[J]. 2019 26TH IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES (VR), 2019, : 1477 - 1484

← 1 2 3 4 5 →