Object category detection using audio-visual cues

被引:0
|
作者
Luo, Jie [1 ,2 ]
Caputo, Barbara [1 ,2 ]
Zweig, Alon [3 ]
Bach, Joerg-Hendrik [4 ]
Anemueller, Joern [4 ]
机构
[1] IDIAP Res Inst, Ctr Parc, CH-1920 Martigny, Switzerland
[2] Swiss Fed Inst Technol, Lausanne, Switzerland
[3] Hebrew Univ Jerusalem, Jerusalem, Israel
[4] Carl von Ossietzky Univ Oldenburg, Oldenburg, Germany
来源
关键词
object categorization; multimodal recognition; audio-visual fusion;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Categorization is one of the fundamental building blocks of cognitive systems. Object categorization has traditionally been addressed in the vision domain, even though cognitive agents are intrinsically multimodal. Indeed, biological systems combine several modalities in order to achieve robust categorization. In this paper we propose a multimodal approach to object category detection, using audio and visual information. The auditory channel is modeled on biologically motivated spectral features via a discriminative classifier. The visual channel is modeled by a state of the art part based model. Multimodality is achieved using two fusion schemes, one high level and the other low level. Experiments on six different object categories, under increasingly difficult conditions, show strengths and weaknesses of the two approaches, and clearly underline the open challenges for multimodal category detection.
引用
收藏
页码:539 / 548
页数:10
相关论文
共 50 条
  • [31] Joint modelling of audio-visual cues using attention mechanisms for emotion recognition
    Ghaleb, Esam
    Niehues, Jan
    Asteriadis, Stylianos
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (08) : 11239 - 11264
  • [32] Joint modelling of audio-visual cues using attention mechanisms for emotion recognition
    Esam Ghaleb
    Jan Niehues
    Stylianos Asteriadis
    Multimedia Tools and Applications, 2023, 82 : 11239 - 11264
  • [33] Joint Audio-Visual Deepfake Detection
    Zhou, Yipin
    Lim, Ser-Nam
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14780 - 14789
  • [34] Audio-Visual Detection Benefits in the Rat
    Gleiss, Stephanie
    Kayser, Christoph
    PLOS ONE, 2012, 7 (09):
  • [35] Incongruence Detection in Audio-Visual Processing
    Havlena, Michal
    Heller, Jan
    Kayser, Hendrik
    Bach, Joerg-Hendrik
    Anemueller, Joern
    Pajdla, Tomas
    DETECTION AND IDENTIFICATION OF RARE AUDIOVISUAL CUES, 2012, 384 : 67 - +
  • [36] Audio-visual talking face detection
    Li, MK
    Li, DG
    Dimitrova, N
    Sethi, I
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL II, PROCEEDINGS, 2003, : 473 - 476
  • [37] Tampering Detection of Audio-Visual Content using Encrypted Watermarks
    Rigoni, Ronaldo
    Freitas, Pedro Garcia
    Farias, Mylene C. Q.
    2014 27TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 2014, : 196 - 203
  • [38] Audio-Visual Voice Activity Detection Using Diffusion Maps
    Dov, David
    Talmon, Ronen
    Cohen, Israel
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (04) : 732 - 745
  • [39] Audio-Visual Voice Activity Detection Using Diffusion Maps
    Department of Electrical Engineering, Technion-Israel Institute of Technology, Haifa
    32000, Israel
    IEEE Trans. Audio Speech Lang. Process., 4 (732-745):
  • [40] Speaker position detection system using audio-visual information
    Matsuo, N
    Kitagawa, H
    Nagata, S
    FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 1999, 35 (02): : 212 - 220