AUDIO-VISUAL SPEAKER LOCALIZATION VIA WEIGHTED CLUSTERING

被引:0
|
作者
Gebru, Israel D. [1 ]
Alameda-Pineda, Xavier [1 ]
Horaud, Radu [1 ]
Forbes, Florence [1 ]
机构
[1] INRIA Grenoble Rhone Alpes, Grenoble, France
基金
欧盟第七框架计划;
关键词
Mixture models; audiovisual fusion; multimodal signal processing; weighted-data clustering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we address the problem of detecting and locating speakers using audiovisual data. We address this problem in the framework of clustering. We propose a novel weighted clustering method based on a finite mixture model which explores the idea of non-uniform weighting of observations. Weighted-data clustering techniques have already been proposed, but not in a generative setting as presented here. We introduce a weighted-data mixture model and we formally devise the associated EM procedure. The clustering algorithm is applied to the problem of detecting and localizing a speaker over time using both visual and auditory observations gathered with a single camera and two microphones. Audiovisual fusion is enforced by introducing a cross-modal weighting scheme. We test the robustness of the method with experiments in two challenging scenarios: disambiguate between an active and a non-active speaker, and associate a speech signal with a person.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Audio-Visual Clustering for 3D Speaker Localization
    Khalidov, Vasil
    Forbes, Florence
    Hansard, Miles
    Arnaud, Elise
    Horaud, Radu
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, PROCEEDINGS, 2008, 5237 : 86 - 97
  • [2] Deep Audio-Visual Beamforming for Speaker Localization
    Qian, Xinyuan
    Zhang, Qiquan
    Guan, Guohui
    Xue, Wei
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1132 - 1136
  • [3] Audio-visual speaker localization using graphical models
    Kushal, Akash
    Rahurkar, Mandar
    Li Fei-Fei
    Ponce, Jean
    Huang, Thomas
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 291 - +
  • [4] Probabilistic speaker localization in noisy enviromments by audio-visual integration
    Choi, Jong-Suk
    Kim, Munsang
    Kim, Hyun-Don
    [J]. 2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 4704 - +
  • [5] Paper: Speaker Localization Based on Audio-Visual Bimodal Fusion
    Zhu, Ying-Xin
    Jin, Hao-Ran
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2021, 25 (03) : 375 - 382
  • [6] AV16.3: An audio-visual corpus for speaker localization and tracking
    Lathoud, G
    Odobez, JM
    Gatica-Perez, D
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3361 : 182 - 195
  • [7] Audio-Visual Synchronisation for Speaker Diarisation
    Garau, Giulia
    Dielmann, Alfred
    Bourlard, Herve
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2662 - +
  • [8] Binaural Audio-Visual Localization
    Wu, Xinyi
    Wu, Zhenyao
    Ju, Lili
    Wang, Song
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2961 - 2968
  • [9] Audio-Visual Speaker Verification via Joint Cross-Attention
    Rajasekhar, Gnana Praveen
    Alam, Jahangir
    [J]. SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 18 - 31
  • [10] Real-time speaker localization and speech separation by audio-visual integration
    Nakadai, K
    Hidai, K
    Okuno, HG
    Kitano, H
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2002, : 1043 - 1049