Speaker Localization among multi-faces in noisy environment by audio-visual Integration

被引:10
|
作者
Kim, Hyun-Don [1 ]
Choi, Jong-Suk [1 ]
Kim, Munsang [1 ]
机构
[1] Intelligent Robot Res Ctr, Korea Inst Sci & Technol, Seoul, South Korea
关键词
sound localization; face tracking; voice activity detection; human robot interaction; audiovisual integration;
D O I
10.1109/ROBOT.2006.1641889
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we not only developed a reliable sound localization system including VAD (Voice Activity Detection) component using three microphones but also a face tracking system using a vision camera. Moreover, we proposed a way to integrate these systems in the human-robot interaction to compensate the errors in the localization of a speaker and to reject unnecessary speech or noise signals entering from the undesired directions effectively. For the purpose of verifying our system's performances, we installed the proposed audition and vision system to the prototype robot, called IRORAA (Intelligent ROBot for Active Audition), and showed how to integrate an audio-visual system.
引用
收藏
页码:1305 / 1310
页数:6
相关论文
共 50 条
  • [1] Probabilistic speaker localization in noisy enviromments by audio-visual integration
    Choi, Jong-Suk
    Kim, Munsang
    Kim, Hyun-Don
    2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 4704 - +
  • [2] Integration of audio-visual information for multi-speaker multimedia speaker recognition
    Yang, Jichen
    Chen, Fangfan
    Cheng, Yu
    Lin, Pei
    DIGITAL SIGNAL PROCESSING, 2024, 145
  • [3] Deep Audio-Visual Beamforming for Speaker Localization
    Qian, Xinyuan
    Zhang, Qiquan
    Guan, Guohui
    Xue, Wei
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1132 - 1136
  • [4] Real-time speaker localization and speech separation by audio-visual integration
    Nakadai, K
    Hidai, K
    Okuno, HG
    Kitano, H
    2002 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2002, : 1043 - 1049
  • [5] AUDIO-VISUAL SPEAKER LOCALIZATION VIA WEIGHTED CLUSTERING
    Gebru, Israel D.
    Alameda-Pineda, Xavier
    Horaud, Radu
    Forbes, Florence
    2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2014,
  • [6] Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
    Jiang, Hao
    Murdock, Calvin
    Ithapu, Vamsi Krishna
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10534 - 10542
  • [7] Audio-visual speaker localization using graphical models
    Kushal, Akash
    Rahurkar, Mandar
    Li Fei-Fei
    Ponce, Jean
    Huang, Thomas
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 291 - +
  • [8] Multi-Speaker Audio-Visual Corpus RUSAVIC: Russian Audio-Visual Speech in Cars
    Ivanko, Denis
    Ryumin, Dmitry
    Axyonov, Alexandr
    Kashevnik, Alexey
    Karpov, Alexey
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1555 - 1559
  • [9] Audio-Visual Clustering for 3D Speaker Localization
    Khalidov, Vasil
    Forbes, Florence
    Hansard, Miles
    Arnaud, Elise
    Horaud, Radu
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, PROCEEDINGS, 2008, 5237 : 86 - 97
  • [10] Paper: Speaker Localization Based on Audio-Visual Bimodal Fusion
    Zhu, Ying-Xin
    Jin, Hao-Ran
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2021, 25 (03) : 375 - 382