Scene recognition with audio-visual sensor fusion

被引:0
|
作者
Devicharan, D [1 ]
Mehrotra, KG [1 ]
Mohan, CK [1 ]
Varshney, PK [1 ]
Zuo, L [1 ]
机构
[1] Syracuse Univ, Dept EECS, Syracuse, NY 13244 USA
关键词
multimodal sensor fusion; scene recognition; activity detection; audio and visual surveillance;
D O I
10.1117/12.605751
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Several surveillance applications are characterized by the ability to gather information about the scene from more than one sensor modality, and heterogeneous sensor data must then be fused by the decision-maker. hi this paper, we discuss the issues relevant to developing a model for fusion of information from audio and visual sensors, and present a framework to enhance decision-making capabilities. In particular, our methodology focuses on the issues of temporal reasoning, uncertainty representations, and coupling between features inferred from data streams coming from different sensors. We propose a conditional probability-based representation for uncertainty, along with fuzzy rules to assist decision-making, and a matrix representation of the coupling between sensor data streams. We also develop a fusion algorithm that utilizes these representations.
引用
收藏
页码:201 / 210
页数:10
相关论文
共 50 条
  • [1] Incremental Audio-Visual Fusion for Person Recognition in Earthquake Scene
    You, Sisi
    Zuo, Yukun
    Yao, Hantao
    Xu, Changsheng
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (02)
  • [2] Detection of documentary scene changes by audio-visual fusion
    Velivelli, A
    Ngo, CW
    Huang, TS
    [J]. IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 : 227 - 237
  • [3] Multifactor fusion for audio-visual speaker recognition
    Chetty, Girija
    Tran, Dat
    [J]. LECTURE NOTES IN SIGNAL SCIENCE, INTERNET AND EDUCATION (SSIP'07/MIV'07/DIWEB'07), 2007, : 70 - +
  • [4] Bimodal fusion in audio-visual speech recognition
    Zhang, XZ
    Mersereau, RM
    Clements, M
    [J]. 2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2002, : 964 - 967
  • [5] EXPANDING AUDIO-VISUAL SCENE
    RICHMOND, JW
    [J]. AMERICAN JOURNAL OF ORTHODONTICS, 1965, 51 (04): : 298 - &
  • [6] Audio-visual fuzzy fusion for robust speech recognition
    Malcangi, M.
    Ouazzane, K.
    Patel, P.
    [J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [7] Weighting schemes for audio-visual fusion in speech recognition
    Glotin, H
    Vergyri, D
    Neti, C
    Potamianos, G
    Luettin, J
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 173 - 176
  • [8] Multistage information fusion for audio-visual speech recognition
    Chu, SM
    Libal, V
    Marcheret, E
    Neti, C
    Potamianos, G
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1651 - 1654
  • [9] Dynamic Audio-Visual Biometric Fusion for Person Recognition
    Alsaedi, Najlaa Hindi
    Jaha, Emad Sami
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (01): : 1283 - 1311
  • [10] Fusion of Classifier Predictions for Audio-Visual Emotion Recognition
    Noroozi, Fatemeh
    Marjanovic, Marina
    Njegus, Angelina
    Escalera, Sergio
    Anbarjafari, Gholamreza
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 61 - 66