Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment

被引:14
|
作者
Sivasankaran, Sunit [1 ]
Vincent, Emmanuel [1 ]
Fohr, Dominique [1 ]
机构
[1] Univ Lorraine, CNRS, INRIA, LORIA, F-54000 Nancy, France
关键词
Speaker localization; wake-up word; convolutional neural network; reverberation; overlapping speech;
D O I
10.21437/Interspeech.2018-1526
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker localization is a hard task, especially in adverse environmental conditions involving reverberation and noise. In this work we introduce the new task of localizing the speaker who uttered a given keyword, e.g., the wake-up word of a distant microphone voice command system, in the presence of overlapping speech. We employ a convolutional neural network based localization system and investigate multiple identifiers as additional inputs to the system in order to characterize this speaker. We conduct experiments using ground truth identifiers which are obtained assuming the availability of clean speech and also in realistic conditions where the identifiers are computed from the corrupted speech. We find that the identifier consisting of the ground truth time -frequency mask corresponding to the target speaker provides the best localization performance and we propose methods to estimate such a mask in adverse reverberant and noisy conditions using the considered keyword.
引用
收藏
页码:2703 / 2707
页数:5
相关论文
共 50 条
  • [41] Multi-speaker Recognition in Cocktail Party Problem
    Wang, Yiqian
    Sun, Wensheng
    [J]. COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2019, 463 : 2116 - 2123
  • [42] Automatic Transcription and Captioning System for Bahasa Indonesia in Multi-Speaker Environment
    Andra, Muhammad Bagus
    Usagawa, Tsuyoshi
    [J]. 2020 5TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS 2020), 2020, : 51 - 56
  • [43] MultiSpeech: Multi-Speaker Text to Speech with Transformer
    Chen, Mingjian
    Tan, Xu
    Ren, Yi
    Xu, Jin
    Sun, Hao
    Zhao, Sheng
    Qin, Tao
    [J]. INTERSPEECH 2020, 2020, : 4024 - 4028
  • [44] Multi-speaker Beamforming for Voice Activity Classification
    Tran, Thuy N.
    Cowley, William
    Pollok, Andre
    [J]. 2013 AUSTRALIAN COMMUNICATIONS THEORY WORKSHOP (AUSCTW), 2013, : 116 - 121
  • [45] AN INVESTIGATION OF MULTI-SPEAKER TRAINING FORWAVENET VOCODER
    Hayashi, Tomoki
    Tamamori, Akira
    Kobayashi, Kazuhiro
    Takeda, Kazuya
    Toda, Tomoki
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 712 - 718
  • [46] Multi-speaker experimental designs: Methodological considerations
    Offrede, Tom
    Fuchs, Susanne
    Mooshammer, Christine
    [J]. LANGUAGE AND LINGUISTICS COMPASS, 2021, 15 (12):
  • [47] Multi-speaker articulatory reconstruction based on an Eigen articulatory HMM
    Hiroya, S
    Mochida, T
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 909 - 912
  • [48] A DEEP REINFORCEMENT LEARNING APPROACH TO AUDIO-BASED NAVIGATION IN A MULTI-SPEAKER ENVIRONMENT
    Giannakopoulos, Petros
    Pikrakis, Aggelos
    Cotronis, Yannis
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3475 - 3479
  • [49] A Multi-speaker Tracking Approach under Reverberation Environment based on Finite Set Theory
    Liu, Shuai
    Liu, Hongqing
    Zhou, Yi
    Luo, Zhen
    [J]. PROCEEDINGS OF 2020 IEEE 15TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2020), 2020, : 114 - 120
  • [50] Phoneme Duration Modeling Using Speech Rhythm-Based Speaker Embeddings for Multi-Speaker Speech Synthesis
    Fujita, Kenichi
    Ando, Atsushi
    Ijima, Yusuke
    [J]. INTERSPEECH 2021, 2021, : 3141 - 3145