Integrated audio-visual processing for object localization and tracking

被引：1

作者：

Pingali, GS ^{[1
]}

机构：

[1] AT&T Bell Labs, Lucent Technol, Murray Hill, NJ 07974 USA

来源：

MULTIMEDIA COMPUTING AND NETWORKING 1998 | 1997年 / 3310卷

关键词：

multimodal; people tracking; acoustic talker direction finding; video; audio; multimedia; real time;

D O I：

10.1117/12.298421

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a system that combines audio and visual cues for locating and tracking an object, typically a person, in real time. It is shown that combining a speech source localization algorithm with a video-based head tracking algorithm results in a more accurate and robust tracker than that obtained using any one of the audio or visual modalities. Performance evaluation results are presented with a system that runs in real time on a general purpose processor. The multimodal tracker has several applications such as teleconferencing, multimedia kiosks and interactive games.

引用

页码：206 / 213

页数：8

共 50 条

[21] Deep Audio-Visual Beamforming for Speaker Localization
Qian, Xinyuan
Zhang, Qiquan
Guan, Guohui
Xue, Wei
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1132 - 1136
[22] A generative approach to audio-visual person tracking
Brunelli, Roberto
Brutti, Alessio
Chippendale, Paul
Lanz, Oswald
Omologo, Maurizio
Svaizer, Piergiorgio
Tobia, Francesco
MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, 2007, 4122 : 55 - 68
[23] Real time audio-visual person tracking
Talantzis, Fotios
Pnevmatikakis, Aristodemos
Polymenakos, Lazaros C.
2006 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2006, : 243 - +
[24] Audio-Visual Event Localization in Unconstrained Videos
Tian, Yapeng
Shi, Jing
Li, Bochen
Duan, Zhiyao
Xu, Chenliang
COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 252 - 268
[25] Span-based Audio-Visual Localization
Wu, Yiling
Zhang, Xinfeng
Wang, Yaowei
Huang, Qingming
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1252 - 1260
[26] Multimodal tracking and classification of audio-visual features
Pavlovic, V
1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 1, 1998, : 343 - 347
[27] An fMRI study of the binding of audio-visual information: The dissociation between object and space processing
Sestieri C.
Di Matteo R.
Ferretti A.
Del Gratta C.
Caulo M.
Tartaro A.
Olivetti Belardinelli M.
Romani G.L.
Cognitive Processing, 2006, 7 (Suppl 1) : S138 - S139
[28] Indexing audio-visual sequences by joint audio and video processing
Saraceno, C
Leonardi, R
VSMM98: FUTUREFUSION - APPLICATION REALITIES FOR THE VIRTUAL AGE, VOLS 1 AND 2, 1998, : 686 - 691
[29] Somatosensory contribution to audio-visual speech processing
Ito, Takayuki
Ohashi, Hiroki
Gracco, Vincent L.
CORTEX, 2021, 143 : 195 - 204
[30] Some experiments in audio-visual speech processing
Chollet, G.
Landais, R.
Hueber, T.
Bredin, H.
Mokbel, C.
Perrot, P.
Zouari, L.
ADVANCES IN NONLINEAR SPEECH PROCESSING, 2007, 4885 : 28 - +

← 1 2 3 4 5 →