VIDEO EVENT DETECTION AND SUMMARIZATION USING AUDIO, VISUAL AND TEXT SALIENCY

被引：31

作者：

Evangelopoulos, G. ^{[1
]}

Zlatintsi, A. ^{[1
]}

Skoumas, G. ^{[2
]}

Rapantzikos, K. ^{[1
]}

Potamianos, A. ^{[2
]}

Maragos, P. ^{[1
]}

Avrithis, Y. ^{[1
]}

机构：

[1] Natl Tech Univ Athens, Sch ECE, GR-15773 Athens, Greece

[2] Tech Univ Crete, Dept ECE, Khania EL-73100, Greece

来源：

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年

关键词：

multimodal saliency; audio; video; text processing; video abstraction; movie summarization; ATTENTION MODEL;

D O I：

10.1109/ICASSP.2009.4960393

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Detection of perceptually important video events is formulated here on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. Audio saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color and motion. Text saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The various modality curves are integrated in a single attention curve, where the presence of an event may be signified in one or multiple domains. This multimodal saliency curve is the basis of a bottom-up video summarization algorithm, that refines results from unimodal or audiovisual-based skimming. The algorithm performs favorably for video summarization in terms of informativeness and enjoyability.

引用

页码：3553 / +

页数：2

共 50 条

[41] Long Range Audio and Audio-Visual Event Detection Using a Laser Doppler Vibrometer
Wang, Tao
Zhu, Zhigang
Divakaran, Ajay
EVOLUTIONARY AND BIO-INSPIRED COMPUTATION: THEORY AND APPLICATIONS IV, 2010, 7704
[42] Combining text and audio-visual features in video indexing
Chang, SF
Manmatha, R
Chua, TS
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1005 - 1008
[43] An enhanced video summarization system using audio features for a personal video recorder
Otsuka, I
Radhakrishnan, R
Siracusa, M
Divakaran, A
Mishima, H
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2006, 52 (01) : 168 - 172
[44] Visual Attention Modeling in Compressed Domain:From Image Saliency Detection to Video Saliency Detection
FANG Yuming
ZHANG Xiaoqiang
ZTE Communications, 2019, 17 (01) : 31 - 37
[45] Using Webcast Text for Semantic Event Detection in Broadcast Sports Video
Xu, Changsheng
Zhang, Yi-Fan
Zhu, Guangyu
Rui, Yong
Lu, Hanqing
Huang, Qingming
IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (07) : 1342 - 1355
[46] Towards Comprehensive Understanding of Event Detection and Video Summarization Approaches
Kalaivani, P.
Roomi, Mohamed Mansoor S.
2017 SECOND INTERNATIONAL CONFERENCE ON RECENT TRENDS AND CHALLENGES IN COMPUTATIONAL MODELS (ICRTCCM), 2017, : 61 - 66
[47] Audio-visual event recognition in surveillance video sequences
Cristani, Marco
Bicego, Manuele
Murino, Vittorio
IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (02) : 257 - 267
[48] Audio-based event detection for sports video
Baillie, M
Jose, JM
IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 : 300 - 309
[49] Creating audio keywords for event detection in soccer video
Xu, M
Maddage, NC
Xu, CS
Kankanhalli, M
Tian, Q
2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL II, PROCEEDINGS, 2003, : 281 - 284
[50] Text to Region: Visual-Word Guided Saliency Detection
Xing, Tengfei
Wang, Zhaohui
Yang, Jianyu
Ji, Yi
Liu, Chunping
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 740 - 749

← 1 2 3 4 5 →