Scene recognition with audio-visual sensor fusion

被引:0
|
作者
Devicharan, D [1 ]
Mehrotra, KG [1 ]
Mohan, CK [1 ]
Varshney, PK [1 ]
Zuo, L [1 ]
机构
[1] Syracuse Univ, Dept EECS, Syracuse, NY 13244 USA
关键词
multimodal sensor fusion; scene recognition; activity detection; audio and visual surveillance;
D O I
10.1117/12.605751
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Several surveillance applications are characterized by the ability to gather information about the scene from more than one sensor modality, and heterogeneous sensor data must then be fused by the decision-maker. hi this paper, we discuss the issues relevant to developing a model for fusion of information from audio and visual sensors, and present a framework to enhance decision-making capabilities. In particular, our methodology focuses on the issues of temporal reasoning, uncertainty representations, and coupling between features inferred from data streams coming from different sensors. We propose a conditional probability-based representation for uncertainty, along with fuzzy rules to assist decision-making, and a matrix representation of the coupling between sensor data streams. We also develop a fusion algorithm that utilizes these representations.
引用
收藏
页码:201 / 210
页数:10
相关论文
共 50 条
  • [41] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [42] Audio-visual spontaneous emotion recognition
    Zeng, Zhihong
    Hu, Yuxiao
    Roisman, Glenn I.
    Wen, Zhen
    Fu, Yun
    Huang, Thomas S.
    [J]. ARTIFICIAL INTELLIGENCE FOR HUMAN COMPUTING, 2007, 4451 : 72 - +
  • [43] Deep Audio-Visual Speech Recognition
    Afouras, Triantafyllos
    Chung, Joon Son
    Senior, Andrew
    Vinyals, Oriol
    Zisserman, Andrew
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
  • [44] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
    Estellers, Virginia
    Thiran, Jean-Philippe
    [J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
  • [45] Audio-visual integration for speech recognition
    Kober, R
    Harz, U
    [J]. NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
  • [46] Audio-visual affective expression recognition
    Huang, Thomas S.
    Zeng, Zhihong
    [J]. MIPPR 2007: PATTERN RECOGNITION AND COMPUTER VISION, 2007, 6788
  • [47] Audio-Visual Recognition of Pain Intensity
    Thiam, Patrick
    Kessler, Viktor
    Walter, Steffen
    Palm, Guenther
    Schwenker, Friedhelm
    [J]. MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, MPRSS 2016, 2017, 10183 : 110 - 126
  • [48] Audio-visual speech recognition by speechreading
    Zhang, XZ
    Mersereau, RM
    Clements, MA
    [J]. DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2, 2002, : 1069 - 1072
  • [49] Audio-Visual Speech Recognition in Noisy Audio Environments
    Palecek, Karel
    Chaloupka, Josef
    [J]. 2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 484 - 487
  • [50] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    [J]. APPLIED ACOUSTICS, 2023, 211