Audio-visual event recognition in surveillance video sequences

被引:104
|
作者
Cristani, Marco [1 ]
Bicego, Manuele
Murino, Vittorio
机构
[1] Univ Verona, Dipartimento Informat, I-37134 Verona, Italy
[2] Univ Sassari, DEIR, I-07100 Sassari, Italy
关键词
audio-visual analysis; automated surveillance; event classification and clustering; multimodal background modelling and foreground detection; multimodality; scene analysis;
D O I
10.1109/TMM.2006.886263
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of the automated surveillance field, automatic scene analysis and understanding systems typically consider only visual information, whereas other modalities, such as audio, are typically disregarded. This paper presents a new method able to integrate audio and visual information for scene analysis in a typical surveillance scenario, using only one camera and one monaural microphone. Visual information is analyzed by a standard visual background/foreground (BG/FG) modelling module, enhanced with a novelty detection stage and coupled with an audio BG/FG modelling scheme. These processes permit one to detect separate audio and visual patterns representing unusual unimodal events in a scene. The integration of audio and visual data is subsequently performed by exploiting the concept of synchrony between such events. The audio-visual (AV) association is carried out on-line and without need for training sequences, and is actually based on the computation of a characteristic feature called audio-video concurrence matrix, allowing one to detect and segment AV events, as well as to discriminate between them. Experimental tests involving classification and clustering of events show all the potentialities of the proposed approach, also in comparison with the results obtained by employing the single modalities and without considering the synchrony issue.
引用
收藏
页码:257 / 267
页数:11
相关论文
共 50 条
  • [1] Indexing audio-visual sequences by joint audio and video processing
    Saraceno, C
    Leonardi, R
    VSMM98: FUTUREFUSION - APPLICATION REALITIES FOR THE VIRTUAL AGE, VOLS 1 AND 2, 1998, : 686 - 691
  • [2] Audio-Visual Emotion Recognition in Video Clips
    Noroozi, Fatemeh
    Marjanovic, Marina
    Njegus, Angelina
    Escalera, Sergio
    Anbarjafari, Gholamreza
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (01) : 60 - 75
  • [3] AUDIO-VISUAL EVENT RECOGNITION THROUGH THE LENS OF ADVERSARY
    Li, Juncheng B.
    Ma, Kaixin
    Qu, Shuhui
    Huang, Po-Yao
    Metze, Florian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 616 - 620
  • [4] Audio-visual speaker recognition for video broadcast news
    Maison, B
    Neti, C
    Senior, A
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2001, 29 (1-2): : 71 - 79
  • [5] Audio-Visual Glance Network for Efficient Video Recognition
    Nugroho, Muhammad Adi
    Woo, Sangmin
    Lee, Sumin
    Kim, Changick
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10116 - 10125
  • [6] Audio-Visual Speaker Recognition for Video Broadcast News
    Benoît Maison
    Chalapathy Neti
    Andrew Senior
    Journal of VLSI signal processing systems for signal, image and video technology, 2001, 29 : 71 - 79
  • [7] Identification of story units in audio-visual sequences by joint audio and video processing
    Saraceno, C
    Leonardi, R
    1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 1, 1998, : 363 - 367
  • [8] Discovering joint audio-visual codewords for video event detection
    Jhuo, I-Hong
    Ye, Guangnan
    Gao, Shenghua
    Liu, Dong
    Jiang, Yu-Gang
    Lee, D. T.
    Chang, Shih-Fu
    MACHINE VISION AND APPLICATIONS, 2014, 25 (01) : 33 - 47
  • [9] AUDIO-VISUAL FUSION AND CONDITIONING WITH NEURAL NETWORKS FOR EVENT RECOGNITION
    Brousmiche, Mathilde
    Rouat, Jean
    Dupont, Stephane
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [10] Multimodal Attentive Fusion Network for audio-visual event recognition
    Brousmiche, Mathilde
    Rouat, Jean
    Dupont, Stephane
    INFORMATION FUSION, 2022, 85 : 52 - 59