PREDICTING AUDIO-VISUAL SALIENT EVENTS BASED ON VISUAL, AUDIO AND TEXT MODALITIES FOR MOVIE SUMMARIZATION

被引:0
|
作者
Koutras, P. [1 ]
Zlatintsi, A. [1 ]
Iosif, E. [1 ]
Katsamanis, A. [1 ]
Maragos, P. [1 ]
Potamianos, A. [1 ]
机构
[1] Natl Tech Univ Athens, Sch ECE, GR-15773 Athens, Greece
关键词
Visual saliency; auditory saliency; affective text analysis; audio-visual salient events; movie summarization; AUDITORY ATTENTION; MODEL; FRAMEWORK;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we present a new and improved synergistic approach to the problem of audio-visual salient event detection and movie summarization based on visual, audio and text modalities. Spatio-temporal visual saliency is estimated through a perceptually inspired frontend based on 3D (space, time) Gabor filters and frame-wise features are extracted from the saliency volumes. For the auditory salient event detection we extract features based on Teager-Kaiser Energy Operator, while text analysis incorporates part-of-speech tagging and affective modeling of single words on the movie subtitles. For the evaluation of the proposed system, we employ an elementary and non-parametric classification technique like KNN. Detection results are reported on the MovSum database, using objective evaluations against ground-truth denoting the perceptually salient events, and human evaluations of the movie summaries. Our evaluation verifies the appropriateness of the proposed methods compared to our baseline system. Finally, our newly proposed summarization algorithm produces summaries that consist of salient and meaningful events, also improving the comprehension of the semantics.
引用
收藏
页码:4361 / 4365
页数:5
相关论文
共 50 条
  • [1] An audio-visual saliency model for movie summarization
    Rapantzikos, Konstantinos
    Evangelopoulos, Georgios
    Maragos, Petros
    Avrithis, Yannis
    [J]. 2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 320 - 323
  • [2] AUDIO SALIENT EVENT DETECTION AND SUMMARIZATION USING AUDIO AND TEXT MODALITIES
    Zlatintsi, Athanasia
    Iosif, Elias
    Maragos, Petros
    Potamianos, Alexandros
    [J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2311 - 2315
  • [3] Automatic Movie Summarization Based on the Visual-Audio Features
    Li, Chen
    Xie, Yuxiang
    Luan, Xidao
    Zhang, Kaichao
    Bai, Liang
    [J]. 2014 IEEE 17th International Conference on Computational Science and Engineering (CSE), 2014, : 1758 - 1761
  • [4] AUTOMATIC SUMMARIZATION OF AUDIO-VISUAL SOCCER FEEDS
    Chen, Fan
    De Vleeschouwer, C.
    Duxans Barrobes, H.
    Gregorio Escalada, J.
    Conejero, D.
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2010), 2010, : 837 - 842
  • [5] A audio-visual model for efficient video summarization
    El-Nagar, Gamal
    El-Sawy, Ahmed
    Rashad, Metwally
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 100
  • [6] Attention-Based Audio-Visual Fusion for Video Summarization
    Fang, Yinghong
    Zhang, Junpeng
    Lu, Cewu
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2019), PT II, 2019, 11954 : 328 - 340
  • [7] Audio-visual event detection based on mining of semantic audio-visual labels
    Goh, KS
    Miyahara, K
    Radhakrishan, R
    Xiong, ZY
    Divakaran, A
    [J]. STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 292 - 299
  • [8] The role of audio-visual congruence in discrimination of visual events
    Sinico, M
    [J]. PERCEPTION, 2004, 33 : 141 - 141
  • [9] An audio-visual distance for audio-visual speech vector quantization
    Girin, L
    Foucher, E
    Feng, G
    [J]. 1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528
  • [10] Catching audio-visual mice:: The extrapolation of audio-visual speed
    Hofbauer, MM
    Wuerger, SM
    Meyer, GF
    Röhrbein, F
    Schill, K
    Zetzsche, C
    [J]. PERCEPTION, 2003, 32 : 96 - 96