Integrated audio-visual processing for object localization and tracking

被引:1
|
作者
Pingali, GS [1 ]
机构
[1] AT&T Bell Labs, Lucent Technol, Murray Hill, NJ 07974 USA
来源
关键词
multimodal; people tracking; acoustic talker direction finding; video; audio; multimedia; real time;
D O I
10.1117/12.298421
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a system that combines audio and visual cues for locating and tracking an object, typically a person, in real time. It is shown that combining a speech source localization algorithm with a video-based head tracking algorithm results in a more accurate and robust tracker than that obtained using any one of the audio or visual modalities. Performance evaluation results are presented with a system that runs in real time on a general purpose processor. The multimodal tracker has several applications such as teleconferencing, multimedia kiosks and interactive games.
引用
收藏
页码:206 / 213
页数:8
相关论文
共 50 条
  • [41] Joint audio-visual tracking using particle filters
    Zotkin, DN
    Duraiswami, R
    Davis, LS
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1154 - 1164
  • [42] An audio-visual database for evaluating person tracking algorithms
    Krinidis, M
    Stamou, G
    Teutsch, H
    Spors, S
    Nikolaidis, N
    Rabenstein, R
    Pitas, L
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 237 - 240
  • [43] Audio-visual Human Tracking for Active Robot Perception
    Bayram, Baris
    Ince, Gokhan
    [J]. 2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 1264 - 1267
  • [44] Audio-Visual Speech-Turn Detection and Tracking
    Gebru, Israel D.
    Ba, Sileye
    Evangelidis, Georgios
    Horaud, Radu
    [J]. LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION, LVA/ICA 2015, 2015, 9237 : 143 - 151
  • [45] AUDIO-VISUAL SPEAKER LOCALIZATION VIA WEIGHTED CLUSTERING
    Gebru, Israel D.
    Alameda-Pineda, Xavier
    Horaud, Radu
    Forbes, Florence
    [J]. 2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2014,
  • [46] Audio-visual speaker tracking with importance particle filters
    Gatica-Perez, D
    Lathoud, G
    McCowan, I
    Odobez, JM
    Moore, D
    [J]. 2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, : 25 - 28
  • [47] Audio-Visual Localization by Synthetic Acoustic Image Generation
    Sanguineti, Valentina
    Morerio, Pietro
    Del Bue, Alessio
    Murino, Vittorio
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2523 - 2531
  • [48] Audio-visual speaker localization using graphical models
    Kushal, Akash
    Rahurkar, Mandar
    Li Fei-Fei
    Ponce, Jean
    Huang, Thomas
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 291 - +
  • [49] Dual Perspective Network for Audio-Visual Event Localization
    Rao, Varshanth
    Khalil, Md Ibrahim
    Li, Haoda
    Dai, Peng
    Lu, Juwei
    [J]. COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 689 - 704
  • [50] Dual Attention Matching for Audio-Visual Event Localization
    Wu, Yu
    Zhu, Linchao
    Yan, Yan
    Yang, Yi
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6301 - 6309