Scene recognition with audio-visual sensor fusion

被引:0
|
作者
Devicharan, D [1 ]
Mehrotra, KG [1 ]
Mohan, CK [1 ]
Varshney, PK [1 ]
Zuo, L [1 ]
机构
[1] Syracuse Univ, Dept EECS, Syracuse, NY 13244 USA
关键词
multimodal sensor fusion; scene recognition; activity detection; audio and visual surveillance;
D O I
10.1117/12.605751
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Several surveillance applications are characterized by the ability to gather information about the scene from more than one sensor modality, and heterogeneous sensor data must then be fused by the decision-maker. hi this paper, we discuss the issues relevant to developing a model for fusion of information from audio and visual sensors, and present a framework to enhance decision-making capabilities. In particular, our methodology focuses on the issues of temporal reasoning, uncertainty representations, and coupling between features inferred from data streams coming from different sensors. We propose a conditional probability-based representation for uncertainty, along with fuzzy rules to assist decision-making, and a matrix representation of the coupling between sensor data streams. We also develop a fusion algorithm that utilizes these representations.
引用
收藏
页码:201 / 210
页数:10
相关论文
共 50 条
  • [31] An audio-visual speech recognition system for testing new audio-visual databases
    Pao, Tsang-Long
    Liao, Wen-Yuan
    [J]. VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
  • [32] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
    Zhang, Zi-Qiang
    Zhang, Jie
    Zhang, Jian-Shu
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
  • [33] Optimum integration weight for decision fusion audio-visual speech recognition
    Rajavel, R.
    Sathidevi, P. S.
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2015, 10 (1-2) : 145 - 154
  • [34] Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
    Wei, Jie
    Hu, Guanyu
    Yang, Xinyu
    Luu, Anh Tuan
    Dong, Yizhuo
    [J]. INTERSPEECH 2022, 2022, : 1988 - 1992
  • [35] Performance Improvement of Audio-Visual Speech Recognition with Optimal Reliability Fusion
    Tariquzzaman, Md
    Gyu, Song Min
    Young, Kim Jin
    You, Na Seung
    Rashid, M. A.
    [J]. 2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL III, 2010, : 216 - 219
  • [36] Decision Level Fusion for Audio-Visual Speech Recognition in Noisy Conditions
    Sad, Gonzalo D.
    Terissi, Lucas D.
    Gomez, Juan C.
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2016, 2017, 10125 : 360 - 367
  • [37] Audio-Visual Fusion Network Based on Conformer for Multimodal Emotion Recognition
    Guo, Peini
    Chen, Zhengyan
    Li, Yidi
    Liu, Hong
    [J]. ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 315 - 326
  • [38] CATNet: Cross-modal fusion for audio-visual speech recognition
    Wang, Xingmei
    Mi, Jiachen
    Li, Boquan
    Zhao, Yixu
    Meng, Jiaxiang
    [J]. PATTERN RECOGNITION LETTERS, 2024, 178 : 216 - 222
  • [39] An audio-visual sensor fusion approach for feature based vehicle identification
    Klausner, Andreas
    Tengg, Allan
    Leistner, Christian
    Erb, Stefan
    Rinner, Bernhard
    [J]. 2007 IEEE CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, 2007, : 111 - 116
  • [40] Fusion and combination in audio-visual integration
    Omata, Kei
    Mogi, Ken
    [J]. PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2008, 464 (2090): : 319 - 340