3D Audio-Visual Speaker Tracking with A Novel Particle Filter

被引:2
|
作者
Liu, Hong [1 ]
Sun, Yongheng [1 ]
Li, Yidi [1 ]
Yang, Bing [1 ]
机构
[1] Peking Univ, Shenzhen Grad Sch, Key Lab Machine Percept, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Audio-visual fusion; 3D tracking; particle filter; compact platform; LOCALIZATION; FUSION;
D O I
10.1109/ICPR48806.2021.9412682
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D speaker tracking using co-located audio-visual sensors has received much attention recently. Though various methods have been attempted to this field, it is still challenging to obtain a reliable 3D tracking result since the position of co-located sensors are restricted to a small area. In this paper, a novel particle filter (PF) based method is proposed for 3D audio-visual speaker tracking. Compared with traditional PF based audio-visual speaker tracking method, our 3D audio-visual tracker has two main characteristics. In the prediction stage, we use audio-visual information at current frame to further adjust the direction of the particles after the particle state transition process, which can make the particles more concentrated around the speaker direction. In the update stage, the particle likelihood is calculated by fusing both the visual distance and audiovisual direction information. Specially, the distance likelihood is obtained according to the camera projection model and the adaptively estimated size of speaker face or head, and the direction likelihood is determined by audio-visual particle fitness. In this way, the particle likelihood can better represent the speaker presence probability in 3D space. Experimental results show that the proposed tracker outperforms other methods and provides a favorable speaker tracking performance both in 3D space and on the image plane.
引用
收藏
页码:7343 / 7348
页数:6
相关论文
共 50 条
  • [1] 3D AUDIO-VISUAL SPEAKER TRACKING WITH AN ADAPTIVE PARTICLE FILTER
    Qian, Xinyuan
    Brutti, Alessio
    Omologo, Maurizio
    Cavallaro, Andrea
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2896 - 2900
  • [2] 3D AUDIO-VISUAL SPEAKER TRACKING WITH A TWO-LAYER PARTICLE FILTER
    Liu, Hong
    Li, Yidi
    Yang, Bing
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1955 - 1959
  • [3] An audio-visual particle filter for speaker tracking on the CLEAR'06 evaluation dataset
    Nickel, Kai
    Gehrig, Tobias
    Ekenel, Hazim K.
    McDonough, John
    Stiefelhagen, Rainer
    [J]. MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, 2007, 4122 : 69 - 80
  • [4] Audio-visual speaker tracking with importance particle filters
    Gatica-Perez, D
    Lathoud, G
    McCowan, I
    Odobez, JM
    Moore, D
    [J]. 2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, : 25 - 28
  • [5] Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter
    Li, Yidi
    Liu, Hong
    Yang, Bing
    Ding, Runwei
    Chen, Yang
    [J]. COMPLEXITY, 2020, 2020
  • [6] Audio-Visual Clustering for 3D Speaker Localization
    Khalidov, Vasil
    Forbes, Florence
    Hansard, Miles
    Arnaud, Elise
    Horaud, Radu
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, PROCEEDINGS, 2008, 5237 : 86 - 97
  • [7] Particle Flow SMC-PHD Filter for Audio-Visual Multi-speaker Tracking
    Liu, Yang
    Wang, Wenwu
    Chambers, Jonathon
    Kilic, Volkan
    Hilton, Adrian
    [J]. LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2017), 2017, 10169 : 344 - 353
  • [8] Particle Filtering for Bearing-Only Audio-Visual Speaker Detection and Tracking
    Rae, Andrew
    Khamis, Alaa
    Basir, Otman
    Kamel, Mohamed
    [J]. 2009 3RD INTERNATIONAL CONFERENCE ON SIGNALS, CIRCUITS AND SYSTEMS (SCS 2009), 2009, : 161 - +
  • [9] Speaker Tracking Based on Audio-Visual Fusion with Unknown Noise
    Cao, Jie
    Li, Jun
    Li, Wei
    [J]. PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 215 - 226
  • [10] Audio-visual active speaker tracking in cluttered indoors environments
    Talantzis, Fotios
    Pnevmatikakis, Aristodemos
    Constantinides, Anthony G.
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (03): : 799 - 807