Joint audio-visual tracking using particle filters

被引:59
|
作者
Zotkin, DN [1 ]
Duraiswami, R [1 ]
Davis, LS [1 ]
机构
[1] Univ Maryland, Inst Adv Comp Studies, Dept Comp Sci, Perceptual Interfaces & Real Lab, College Pk, MD 20742 USA
关键词
audio-visual tracking; sensor fusion; Monte-Carlo algorithms;
D O I
10.1155/S1110865702206058
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
It is often advantageous to track objects in a scene using multimodal information when such information is available. We use audio as a complementary modality to video data, which, in comparison to vision, can provide faster localization over a wider field of view. We present a particle-filter based tracking framework for performing multimodal sensor fusion for tracking people in a videoconferencing environment using multiple cameras and multiple microphone arrays. One advantage of our proposed tracker is its ability to seamlessly handle temporary absence of some measurements (e.g., camera occlusion or silence). Another advantage is the possibility of self-calibration of the joint system to compensate for imprecision in the knowledge of array or camera parameters by treating them as containing an unknown statistical component that can be determined using the particle filter framework during tracking. We implement the algorithm in the context of a videoconferencing and meeting recording system. The system also performs high-level semantic analysis of the scene by keeping participant tracks, recognizing turn-taking events and recording an annotated transcript of the meeting. Experimental results are presented. Our system operates in real time and is shown to be robust and reliable.
引用
收藏
页码:1154 / 1164
页数:11
相关论文
共 50 条
  • [1] Joint Audio-Visual Tracking Using Particle Filters
    Dmitry N. Zotkin
    Ramani Duraiswami
    Larry S. Davis
    [J]. EURASIP Journal on Advances in Signal Processing, 2002
  • [2] Audio-visual speaker tracking with importance particle filters
    Gatica-Perez, D
    Lathoud, G
    McCowan, I
    Odobez, JM
    Moore, D
    [J]. 2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, : 25 - 28
  • [3] A JOINT AUDIO-VISUAL APPROACH TO AUDIO LOCALIZATION
    Jensen, Jesper Rindom
    Christensen, Mads Graesboll
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 454 - 458
  • [4] Tracking the Active Speaker Based on a Joint Audio-Visual Observation Model
    Gebru, Israel D.
    Ba, Sileye
    Evangelidis, Georgios
    Horaud, Radu
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, : 702 - 708
  • [5] Audio-Visual Tracking of Concurrent Speakers
    Qian, Xinyuan
    Brutti, Alessio
    Lanz, Oswald
    Omologo, Maurizio
    Cavallaro, Andrea
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 942 - 954
  • [6] Audio-visual tracking for natural interactivity
    Pingali, G
    Tunali, G
    Carlbom, I
    [J]. ACM MULTIMEDIA 99, PROCEEDINGS, 1999, : 373 - 382
  • [7] Joint watermarking of audio-visual data
    Dittmann, J
    Steinebach, M
    [J]. 2001 IEEE FOURTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2001, : 601 - 606
  • [8] Joint Audio-Visual Deepfake Detection
    Zhou, Yipin
    Lim, Ser-Nam
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14780 - 14789
  • [9] 3D Audio-Visual Speaker Tracking with A Novel Particle Filter
    Liu, Hong
    Sun, Yongheng
    Li, Yidi
    Yang, Bing
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7343 - 7348
  • [10] An audio-visual particle filter for speaker tracking on the CLEAR'06 evaluation dataset
    Nickel, Kai
    Gehrig, Tobias
    Ekenel, Hazim K.
    McDonough, John
    Stiefelhagen, Rainer
    [J]. MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, 2007, 4122 : 69 - 80