Scene recognition with audio-visual sensor fusion

被引：0

作者：

Devicharan, D ^{[1
]}

Mehrotra, KG ^{[1
]}

Mohan, CK ^{[1
]}

Varshney, PK ^{[1
]}

Zuo, L ^{[1
]}

机构：

[1] Syracuse Univ, Dept EECS, Syracuse, NY 13244 USA

来源：

Multisensor, Multisource Information Fusion: Architectures, Algorithms and Applications 2005 | 2005年 / 5813卷

关键词：

multimodal sensor fusion; scene recognition; activity detection; audio and visual surveillance;

D O I：

10.1117/12.605751

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Several surveillance applications are characterized by the ability to gather information about the scene from more than one sensor modality, and heterogeneous sensor data must then be fused by the decision-maker. hi this paper, we discuss the issues relevant to developing a model for fusion of information from audio and visual sensors, and present a framework to enhance decision-making capabilities. In particular, our methodology focuses on the issues of temporal reasoning, uncertainty representations, and coupling between features inferred from data streams coming from different sensors. We propose a conditional probability-based representation for uncertainty, along with fuzzy rules to assist decision-making, and a matrix representation of the coupling between sensor data streams. We also develop a fusion algorithm that utilizes these representations.

引用

页码：201 / 210

页数：10

共 50 条

[41] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
Tamura, Satoshi
Ishikawa, Masato
Hashiba, Takashi
Takeuchi, Shin'ichi
Hayamizu, Satoru
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
[42] Audio-visual spontaneous emotion recognition
Zeng, Zhihong
Hu, Yuxiao
Roisman, Glenn I.
Wen, Zhen
Fu, Yun
Huang, Thomas S.
[J]. ARTIFICIAL INTELLIGENCE FOR HUMAN COMPUTING, 2007, 4451 : 72 - +
[43] Deep Audio-Visual Speech Recognition
Afouras, Triantafyllos
Chung, Joon Son
Senior, Andrew
Vinyals, Oriol
Zisserman, Andrew
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
[44] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
Estellers, Virginia
Thiran, Jean-Philippe
[J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
[45] Audio-visual integration for speech recognition
Kober, R
Harz, U
[J]. NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
[46] Audio-visual affective expression recognition
Huang, Thomas S.
Zeng, Zhihong
[J]. MIPPR 2007: PATTERN RECOGNITION AND COMPUTER VISION, 2007, 6788
[47] Audio-Visual Recognition of Pain Intensity
Thiam, Patrick
Kessler, Viktor
Walter, Steffen
Palm, Guenther
Schwenker, Friedhelm
[J]. MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, MPRSS 2016, 2017, 10183 : 110 - 126
[48] Audio-visual speech recognition by speechreading
Zhang, XZ
Mersereau, RM
Clements, MA
[J]. DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2, 2002, : 1069 - 1072
[49] Audio-Visual Speech Recognition in Noisy Audio Environments
Palecek, Karel
Chaloupka, Josef
[J]. 2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 484 - 487
[50] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
Hwang, Jung-Wook
Park, Jeongkyun
Park, Rae-Hong
Park, Hyung-Min
[J]. APPLIED ACOUSTICS, 2023, 211

← 1 2 3 4 5 →