Audio-Visual Sound Source Localization and Tracking Based on Mobile Robot for The Cocktail Party Problem

被引:3
|
作者
Shi, Zhanbo [1 ]
Zhang, Lin [1 ]
Wang, Dongqing [1 ]
机构
[1] Tongji Univ, Sch Software Engn, Shanghai 201804, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 10期
基金
中国国家自然科学基金;
关键词
robot audition; sound source localization; cocktail party problem; audio-visual; motion planning; EVENT LOCALIZATION;
D O I
10.3390/app13106056
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Locating the sound source is one of the most important capabilities of robot audition. In recent years, single-source localization techniques have increasingly matured. However, localizing and tracking specific sound sources in multi-source scenarios, which is known as the cocktail party problem, is still unresolved. In order to address this challenge, in this paper, we propose a system for dynamically localizing and tracking sound sources based on audio-visual information that can be deployed on a mobile robot. Our system first locates specific targets using pre-registered voiceprint and face features. Subsequently, the robot moves to track the target while keeping away from other sound sources in the surroundings instructed by the motion module, which helps the robot gather clearer audio data of the target to perform downstream tasks better. Its effectiveness has been verified via extensive real-world experiments with a 20% improvement in the success rate of specific speaker localization and a 14% reduction in word error rate in speech recognition compared to its counterparts.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Audio-visual segmentation and "the cocktail party effect"
    Darrell, T
    Fisher, JW
    Viola, P
    Freeman, W
    [J]. ADVANCES IN MULTIMODAL INTERFACES - ICMI 2000, PROCEEDINGS, 2000, 1948 : 32 - 40
  • [2] Tracking atoms with particles for audio-visual source localization
    Monaci, Gianluca
    Vandergheynst, Pierre
    Maggio, Emilio
    Cavallaro, Andrea
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 753 - +
  • [3] AUDIO-VISUAL DISCREPANCY AND THE INFLUENCE ON VERTICAL SOUND SOURCE LOCALIZATION
    Werner, Stephan
    Liebetrau, Judith
    Sporer, Thomas
    [J]. 2012 Fourth International Workshop on Quality of Multimedia Experience (QoMEX), 2012, : 133 - 139
  • [4] Audio-Visual Fusion for Sound Source Localization and Improved Attention
    Lee, Byoung-gi
    Choi, JongSuk
    Yoon, SangSuk
    Choi, Mun-Taek
    Kim, Munsang
    Kim, Daijin
    [J]. TRANSACTIONS OF THE KOREAN SOCIETY OF MECHANICAL ENGINEERS A, 2011, 35 (07) : 737 - 743
  • [5] Audio-Visual Bimodal Combination-Based Speaker Tracking Method for Mobile Robot
    Zhang, Hao-Yan
    Zhang, Long-Bo
    Shi, Qi-Feng
    Liu, Zhen-Tao
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2024, 28 (01) : 196 - 205
  • [6] Social interaction of humanoid robot based on audio-visual tracking
    Okuno, HG
    Nakadai, K
    Kitano, H
    [J]. DEVELOPMENTS IN APPLIED ARTIFICAIL INTELLIGENCE, PROCEEDINGS, 2002, 2358 : 725 - 735
  • [7] Sound Source Localization in Space based on Audio-Vision System of Mobile Robot
    Chen, Tao
    Zhang, MingLu
    Fu, LingLi
    [J]. ICMS2009: PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON MODELLING AND SIMULATION, VOL 5, 2009, : 467 - 472
  • [8] Audio-visual based non-line-of-sight sound source localization: A feasibility study
    King, E. A.
    Tatoglu, A.
    Iglesias, D.
    Matriss, A.
    [J]. APPLIED ACOUSTICS, 2021, 171
  • [9] Real-time sound source localization and separation based on active audio-visual integration
    Okuno, HG
    Nakadai, K
    [J]. COMPUTATIONAL METHODS IN NEURAL MODELING, PT 1, 2003, 2686 : 118 - 125
  • [10] Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization
    Um, Sung Jin
    Kim, Dongjin
    Kim, Jung Uk
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3507 - 3516