Integrated audio-visual processing for object localization and tracking

被引:1
|
作者
Pingali, GS [1 ]
机构
[1] AT&T Bell Labs, Lucent Technol, Murray Hill, NJ 07974 USA
来源
MULTIMEDIA COMPUTING AND NETWORKING 1998 | 1997年 / 3310卷
关键词
multimodal; people tracking; acoustic talker direction finding; video; audio; multimedia; real time;
D O I
10.1117/12.298421
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a system that combines audio and visual cues for locating and tracking an object, typically a person, in real time. It is shown that combining a speech source localization algorithm with a video-based head tracking algorithm results in a more accurate and robust tracker than that obtained using any one of the audio or visual modalities. Performance evaluation results are presented with a system that runs in real time on a general purpose processor. The multimodal tracker has several applications such as teleconferencing, multimedia kiosks and interactive games.
引用
收藏
页码:206 / 213
页数:8
相关论文
共 50 条
  • [21] Deep Audio-Visual Beamforming for Speaker Localization
    Qian, Xinyuan
    Zhang, Qiquan
    Guan, Guohui
    Xue, Wei
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1132 - 1136
  • [22] A generative approach to audio-visual person tracking
    Brunelli, Roberto
    Brutti, Alessio
    Chippendale, Paul
    Lanz, Oswald
    Omologo, Maurizio
    Svaizer, Piergiorgio
    Tobia, Francesco
    MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, 2007, 4122 : 55 - 68
  • [23] Real time audio-visual person tracking
    Talantzis, Fotios
    Pnevmatikakis, Aristodemos
    Polymenakos, Lazaros C.
    2006 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2006, : 243 - +
  • [24] Audio-Visual Event Localization in Unconstrained Videos
    Tian, Yapeng
    Shi, Jing
    Li, Bochen
    Duan, Zhiyao
    Xu, Chenliang
    COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 252 - 268
  • [25] Span-based Audio-Visual Localization
    Wu, Yiling
    Zhang, Xinfeng
    Wang, Yaowei
    Huang, Qingming
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1252 - 1260
  • [26] Multimodal tracking and classification of audio-visual features
    Pavlovic, V
    1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 1, 1998, : 343 - 347
  • [27] An fMRI study of the binding of audio-visual information: The dissociation between object and space processing
    Sestieri C.
    Di Matteo R.
    Ferretti A.
    Del Gratta C.
    Caulo M.
    Tartaro A.
    Olivetti Belardinelli M.
    Romani G.L.
    Cognitive Processing, 2006, 7 (Suppl 1) : S138 - S139
  • [28] Indexing audio-visual sequences by joint audio and video processing
    Saraceno, C
    Leonardi, R
    VSMM98: FUTUREFUSION - APPLICATION REALITIES FOR THE VIRTUAL AGE, VOLS 1 AND 2, 1998, : 686 - 691
  • [29] Somatosensory contribution to audio-visual speech processing
    Ito, Takayuki
    Ohashi, Hiroki
    Gracco, Vincent L.
    CORTEX, 2021, 143 : 195 - 204
  • [30] Some experiments in audio-visual speech processing
    Chollet, G.
    Landais, R.
    Hueber, T.
    Bredin, H.
    Mokbel, C.
    Perrot, P.
    Zouari, L.
    ADVANCES IN NONLINEAR SPEECH PROCESSING, 2007, 4885 : 28 - +