Integrated audio-visual processing for object localization and tracking

被引:1
|
作者
Pingali, GS [1 ]
机构
[1] AT&T Bell Labs, Lucent Technol, Murray Hill, NJ 07974 USA
来源
关键词
multimodal; people tracking; acoustic talker direction finding; video; audio; multimedia; real time;
D O I
10.1117/12.298421
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a system that combines audio and visual cues for locating and tracking an object, typically a person, in real time. It is shown that combining a speech source localization algorithm with a video-based head tracking algorithm results in a more accurate and robust tracker than that obtained using any one of the audio or visual modalities. Performance evaluation results are presented with a system that runs in real time on a general purpose processor. The multimodal tracker has several applications such as teleconferencing, multimedia kiosks and interactive games.
引用
收藏
页码:206 / 213
页数:8
相关论文
共 50 条
  • [1] Egocentric Audio-Visual Object Localization
    Huang, Chao
    Flan, Yapeng
    Kurnar, Anurag
    Xu, Chenliang
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22910 - 22921
  • [2] Tracking atoms with particles for audio-visual source localization
    Monaci, Gianluca
    Vandergheynst, Pierre
    Maggio, Emilio
    Cavallaro, Andrea
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 753 - +
  • [3] The neural basis of visual dominance in the context of audio-visual object processing
    Schmid, Carmen
    Buechel, Christian
    Rose, Michael
    [J]. NEUROIMAGE, 2011, 55 (01) : 304 - 311
  • [4] Fusion of audio-visual information for integrated speech processing
    Nakamura, S
    [J]. AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2001, 2091 : 127 - 143
  • [5] Binaural Audio-Visual Localization
    Wu, Xinyi
    Wu, Zhenyao
    Ju, Lili
    Wang, Song
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2961 - 2968
  • [6] A JOINT AUDIO-VISUAL APPROACH TO AUDIO LOCALIZATION
    Jensen, Jesper Rindom
    Christensen, Mads Graesboll
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 454 - 458
  • [7] AV16.3: An audio-visual corpus for speaker localization and tracking
    Lathoud, G
    Odobez, JM
    Gatica-Perez, D
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3361 : 182 - 195
  • [8] The impact of auditory, visual, and audio-visual sensory cues on multiple object tracking in children
    Atkins, Polly L.
    Hodgson, Timothy
    Dickinson, Patrick
    Hicks, Kieran
    Focker, Julia
    [J]. PERCEPTION, 2023, 52 (05) : 346 - 346
  • [9] Audio-Visual Tracking of Concurrent Speakers
    Qian, Xinyuan
    Brutti, Alessio
    Lanz, Oswald
    Omologo, Maurizio
    Cavallaro, Andrea
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 942 - 954
  • [10] Audio-visual tracking for natural interactivity
    Pingali, G
    Tunali, G
    Carlbom, I
    [J]. ACM MULTIMEDIA 99, PROCEEDINGS, 1999, : 373 - 382