VIDEO EVENT DETECTION AND SUMMARIZATION USING AUDIO, VISUAL AND TEXT SALIENCY

被引:31
|
作者
Evangelopoulos, G. [1 ]
Zlatintsi, A. [1 ]
Skoumas, G. [2 ]
Rapantzikos, K. [1 ]
Potamianos, A. [2 ]
Maragos, P. [1 ]
Avrithis, Y. [1 ]
机构
[1] Natl Tech Univ Athens, Sch ECE, GR-15773 Athens, Greece
[2] Tech Univ Crete, Dept ECE, Khania EL-73100, Greece
关键词
multimodal saliency; audio; video; text processing; video abstraction; movie summarization; ATTENTION MODEL;
D O I
10.1109/ICASSP.2009.4960393
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Detection of perceptually important video events is formulated here on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. Audio saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color and motion. Text saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The various modality curves are integrated in a single attention curve, where the presence of an event may be signified in one or multiple domains. This multimodal saliency curve is the basis of a bottom-up video summarization algorithm, that refines results from unimodal or audiovisual-based skimming. The algorithm performs favorably for video summarization in terms of informativeness and enjoyability.
引用
收藏
页码:3553 / +
页数:2
相关论文
共 50 条
  • [41] Long Range Audio and Audio-Visual Event Detection Using a Laser Doppler Vibrometer
    Wang, Tao
    Zhu, Zhigang
    Divakaran, Ajay
    EVOLUTIONARY AND BIO-INSPIRED COMPUTATION: THEORY AND APPLICATIONS IV, 2010, 7704
  • [42] Combining text and audio-visual features in video indexing
    Chang, SF
    Manmatha, R
    Chua, TS
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1005 - 1008
  • [43] An enhanced video summarization system using audio features for a personal video recorder
    Otsuka, I
    Radhakrishnan, R
    Siracusa, M
    Divakaran, A
    Mishima, H
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2006, 52 (01) : 168 - 172
  • [44] Visual Attention Modeling in Compressed Domain:From Image Saliency Detection to Video Saliency Detection
    FANG Yuming
    ZHANG Xiaoqiang
    ZTE Communications, 2019, 17 (01) : 31 - 37
  • [45] Using Webcast Text for Semantic Event Detection in Broadcast Sports Video
    Xu, Changsheng
    Zhang, Yi-Fan
    Zhu, Guangyu
    Rui, Yong
    Lu, Hanqing
    Huang, Qingming
    IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (07) : 1342 - 1355
  • [46] Towards Comprehensive Understanding of Event Detection and Video Summarization Approaches
    Kalaivani, P.
    Roomi, Mohamed Mansoor S.
    2017 SECOND INTERNATIONAL CONFERENCE ON RECENT TRENDS AND CHALLENGES IN COMPUTATIONAL MODELS (ICRTCCM), 2017, : 61 - 66
  • [47] Audio-visual event recognition in surveillance video sequences
    Cristani, Marco
    Bicego, Manuele
    Murino, Vittorio
    IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (02) : 257 - 267
  • [48] Audio-based event detection for sports video
    Baillie, M
    Jose, JM
    IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 : 300 - 309
  • [49] Creating audio keywords for event detection in soccer video
    Xu, M
    Maddage, NC
    Xu, CS
    Kankanhalli, M
    Tian, Q
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL II, PROCEEDINGS, 2003, : 281 - 284
  • [50] Text to Region: Visual-Word Guided Saliency Detection
    Xing, Tengfei
    Wang, Zhaohui
    Yang, Jianyu
    Ji, Yi
    Liu, Chunping
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 740 - 749