PREDICTING AUDIO-VISUAL SALIENT EVENTS BASED ON VISUAL, AUDIO AND TEXT MODALITIES FOR MOVIE SUMMARIZATION

被引：0

作者：

Koutras, P. ^{[1
]}

Zlatintsi, A. ^{[1
]}

Iosif, E. ^{[1
]}

Katsamanis, A. ^{[1
]}

Maragos, P. ^{[1
]}

Potamianos, A. ^{[1
]}

机构：

[1] Natl Tech Univ Athens, Sch ECE, GR-15773 Athens, Greece

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) | 2015年

关键词：

Visual saliency; auditory saliency; affective text analysis; audio-visual salient events; movie summarization; AUDITORY ATTENTION; MODEL; FRAMEWORK;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, we present a new and improved synergistic approach to the problem of audio-visual salient event detection and movie summarization based on visual, audio and text modalities. Spatio-temporal visual saliency is estimated through a perceptually inspired frontend based on 3D (space, time) Gabor filters and frame-wise features are extracted from the saliency volumes. For the auditory salient event detection we extract features based on Teager-Kaiser Energy Operator, while text analysis incorporates part-of-speech tagging and affective modeling of single words on the movie subtitles. For the evaluation of the proposed system, we employ an elementary and non-parametric classification technique like KNN. Detection results are reported on the MovSum database, using objective evaluations against ground-truth denoting the perceptually salient events, and human evaluations of the movie summaries. Our evaluation verifies the appropriateness of the proposed methods compared to our baseline system. Finally, our newly proposed summarization algorithm produces summaries that consist of salient and meaningful events, also improving the comprehension of the semantics.

引用

页码：4361 / 4365

页数：5

共 50 条

[1] An audio-visual saliency model for movie summarization
Rapantzikos, Konstantinos
Evangelopoulos, Georgios
Maragos, Petros
Avrithis, Yannis
[J]. 2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 320 - 323
[2] AUDIO SALIENT EVENT DETECTION AND SUMMARIZATION USING AUDIO AND TEXT MODALITIES
Zlatintsi, Athanasia
Iosif, Elias
Maragos, Petros
Potamianos, Alexandros
[J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2311 - 2315
[3] Automatic Movie Summarization Based on the Visual-Audio Features
Li, Chen
Xie, Yuxiang
Luan, Xidao
Zhang, Kaichao
Bai, Liang
[J]. 2014 IEEE 17th International Conference on Computational Science and Engineering (CSE), 2014, : 1758 - 1761
[4] AUTOMATIC SUMMARIZATION OF AUDIO-VISUAL SOCCER FEEDS
Chen, Fan
De Vleeschouwer, C.
Duxans Barrobes, H.
Gregorio Escalada, J.
Conejero, D.
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2010), 2010, : 837 - 842
[5] A audio-visual model for efficient video summarization
El-Nagar, Gamal
El-Sawy, Ahmed
Rashad, Metwally
[J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 100
[6] Attention-Based Audio-Visual Fusion for Video Summarization
Fang, Yinghong
Zhang, Junpeng
Lu, Cewu
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2019), PT II, 2019, 11954 : 328 - 340
[7] Audio-visual event detection based on mining of semantic audio-visual labels
Goh, KS
Miyahara, K
Radhakrishan, R
Xiong, ZY
Divakaran, A
[J]. STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 292 - 299
[8] The role of audio-visual congruence in discrimination of visual events
Sinico, M
[J]. PERCEPTION, 2004, 33 : 141 - 141
[9] An audio-visual distance for audio-visual speech vector quantization
Girin, L
Foucher, E
Feng, G
[J]. 1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528
[10] Catching audio-visual mice:: The extrapolation of audio-visual speed
Hofbauer, MM
Wuerger, SM
Meyer, GF
Röhrbein, F
Schill, K
Zetzsche, C
[J]. PERCEPTION, 2003, 32 : 96 - 96

← 1 2 3 4 5 →