Object category detection using audio-visual cues

被引：0

作者：

Luo, Jie ^{[1
,2
]}

Caputo, Barbara ^{[1
,2
]}

Zweig, Alon ^{[3
]}

Bach, Joerg-Hendrik ^{[4
]}

Anemueller, Joern ^{[4
]}

机构：

[1] IDIAP Res Inst, Ctr Parc, CH-1920 Martigny, Switzerland

[2] Swiss Fed Inst Technol, Lausanne, Switzerland

[3] Hebrew Univ Jerusalem, Jerusalem, Israel

[4] Carl von Ossietzky Univ Oldenburg, Oldenburg, Germany

来源：

COMPUTER VISION SYSTEMS, PROCEEDINGS | 2008年 / 5008卷

关键词：

object categorization; multimodal recognition; audio-visual fusion;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Categorization is one of the fundamental building blocks of cognitive systems. Object categorization has traditionally been addressed in the vision domain, even though cognitive agents are intrinsically multimodal. Indeed, biological systems combine several modalities in order to achieve robust categorization. In this paper we propose a multimodal approach to object category detection, using audio and visual information. The auditory channel is modeled on biologically motivated spectral features via a discriminative classifier. The visual channel is modeled by a state of the art part based model. Multimodality is achieved using two fusion schemes, one high level and the other low level. Experiments on six different object categories, under increasingly difficult conditions, show strengths and weaknesses of the two approaches, and clearly underline the open challenges for multimodal category detection.

引用

页码：539 / 548

页数：10

共 50 条

[41] Audio-visual deepfake detection using articulatory representation learning
Wang, Yujia
Huang, Hua
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 248
[42] Voice activity detection for driver using audio-visual integration
Ninomiya, Yoshiki
Ban, Yoshihide
Maeno, Toshiki
Negi, Daisuke
Miyajima, Chiyomi
Mori, Kensaku
Kitasaka, Takayuki
Suenaga, Yasuhito
Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2008, 62 (03): : 435 - 441
[43] Active Speaker Detection Using Audio-Visual Sensor Array
Kheradiya, Jatin
Reddy, Sandeep C.
Hegde, Rajesh
2014 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2014, : 480 - 484
[44] Multimodal and Temporal Perception of Audio-visual Cues for Emotion Recognition
Ghaleb, Esam
Popa, Mirela
Asteriadis, Stylianos
2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
[45] Audio-visual object search is changed by bilingual experience
Chabal, Sarah
Schroeder, Scott R.
Marian, Viorica
ATTENTION PERCEPTION & PSYCHOPHYSICS, 2015, 77 (08) : 2684 - 2693
[46] Associating Audio-Visual Activity Cues in a Dominance Estimation Framework
Hung, Hayley
Huang, Yan
Yeo, Chuohao
Gatica-Perez, Daniel
2008 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, VOLS 1-3, 2008, : 1644 - +
[47] Audio-visual object search is changed by bilingual experience
Sarah Chabal
Scott R. Schroeder
Viorica Marian
Attention, Perception, & Psychophysics, 2015, 77 : 2684 - 2693
[48] Delivering object-based audio-visual services
Kalva, H
Eleftheriadis, A
Zamora, J
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1999, 45 (04) : 1108 - 1111
[49] Integrated audio-visual processing for object localization and tracking
Pingali, GS
MULTIMEDIA COMPUTING AND NETWORKING 1998, 1997, 3310 : 206 - 213
[50] AUDIO-VISUAL OBJECT LOCALIZATION AND SEPARATION USING LOW-RANK AND SPARSITY
Pu, Jie
Panagakis, Yannis
Petridis, Stavros
Pantic, Maja
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2901 - 2905

← 1 2 3 4 5 →