Audio-visual event recognition in surveillance video sequences

被引：104

作者：

Cristani, Marco ^{[1
]}

Bicego, Manuele

Murino, Vittorio

机构：

[1] Univ Verona, Dipartimento Informat, I-37134 Verona, Italy

[2] Univ Sassari, DEIR, I-07100 Sassari, Italy

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2007年 / 9卷 / 02期

关键词：

audio-visual analysis; automated surveillance; event classification and clustering; multimodal background modelling and foreground detection; multimodality; scene analysis;

D O I：

10.1109/TMM.2006.886263

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the context of the automated surveillance field, automatic scene analysis and understanding systems typically consider only visual information, whereas other modalities, such as audio, are typically disregarded. This paper presents a new method able to integrate audio and visual information for scene analysis in a typical surveillance scenario, using only one camera and one monaural microphone. Visual information is analyzed by a standard visual background/foreground (BG/FG) modelling module, enhanced with a novelty detection stage and coupled with an audio BG/FG modelling scheme. These processes permit one to detect separate audio and visual patterns representing unusual unimodal events in a scene. The integration of audio and visual data is subsequently performed by exploiting the concept of synchrony between such events. The audio-visual (AV) association is carried out on-line and without need for training sequences, and is actually based on the computation of a characteristic feature called audio-video concurrence matrix, allowing one to detect and segment AV events, as well as to discriminate between them. Experimental tests involving classification and clustering of events show all the potentialities of the proposed approach, also in comparison with the results obtained by employing the single modalities and without considering the synchrony issue.

引用

页码：257 / 267

页数：11

共 50 条

[41] Video concept detection by audio-visual grouplets
Wei Jiang
Alexander C. Loui
International Journal of Multimedia Information Retrieval, 2012, 1 (4) : 223 - 238
[42] VIDEO CODING BASED ON AUDIO-VISUAL ATTENTION
Lee, Jong-Seok
De Simone, Francesca
Ebrahimi, Touradj
ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 57 - 60
[43] A audio-visual model for efficient video summarization
El-Nagar, Gamal
El-Sawy, Ahmed
Rashad, Metwally
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 100
[44] Video concept detection by audio-visual grouplets
Jiang, Wei
Loui, Alexander C.
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2012, 1 (04) : 223 - 238
[45] An audio-visual approach to web video categorization
Bogdan Emanuel Ionescu
Klaus Seyerlehner
Ionuţ Mironică
Constantin Vertan
Patrick Lambert
Multimedia Tools and Applications, 2014, 70 : 1007 - 1032
[46] Audio-Visual Attention Networks for Emotion Recognition
Lee, Jiyoung
Kim, Sunok
Kim, Seungryong
Sohn, Kwanghoon
AVSU'18: PROCEEDINGS OF THE 2018 WORKSHOP ON AUDIO-VISUAL SCENE UNDERSTANDING FOR IMMERSIVE MULTIMEDIA, 2018, : 27 - 32
[47] Audio-visual biometric recognition by vector quantization
Das, Amitava
Ghosh, Prasanta
2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 166 - +
[48] A coupled HMM for audio-visual speech recognition
Nefian, AV
Liang, LH
Pi, XB
Xiaoxiang, L
Mao, C
Murphy, K
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2013 - 2016
[49] Scene recognition with audio-visual sensor fusion
Devicharan, D
Mehrotra, KG
Mohan, CK
Varshney, PK
Zuo, L
Multisensor, Multisource Information Fusion: Architectures, Algorithms and Applications 2005, 2005, 5813 : 201 - 210
[50] Speaker independent audio-visual speech recognition
Zhang, Y
Levinson, S
Huang, T
2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1073 - 1076

← 1 2 3 4 5 →