Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment

被引:14
|
作者
Sivasankaran, Sunit [1 ]
Vincent, Emmanuel [1 ]
Fohr, Dominique [1 ]
机构
[1] Univ Lorraine, CNRS, INRIA, LORIA, F-54000 Nancy, France
关键词
Speaker localization; wake-up word; convolutional neural network; reverberation; overlapping speech;
D O I
10.21437/Interspeech.2018-1526
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker localization is a hard task, especially in adverse environmental conditions involving reverberation and noise. In this work we introduce the new task of localizing the speaker who uttered a given keyword, e.g., the wake-up word of a distant microphone voice command system, in the presence of overlapping speech. We employ a convolutional neural network based localization system and investigate multiple identifiers as additional inputs to the system in order to characterize this speaker. We conduct experiments using ground truth identifiers which are obtained assuming the availability of clean speech and also in realistic conditions where the identifiers are computed from the corrupted speech. We find that the identifier consisting of the ground truth time -frequency mask corresponding to the target speaker provides the best localization performance and we propose methods to estimate such a mask in adverse reverberant and noisy conditions using the considered keyword.
引用
收藏
页码:2703 / 2707
页数:5
相关论文
共 50 条
  • [11] Multi-speaker articulatory trajectory formation based on speaker-independent articulatory HMMs
    Hiroya, Sadao
    Mochida, Takemi
    [J]. SPEECH COMMUNICATION, 2006, 48 (12) : 1677 - 1690
  • [12] Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
    Udagawa, Kenta
    Saito, Yuki
    Saruwatari, Hiroshi
    [J]. INTERSPEECH 2022, 2022, : 2968 - 2972
  • [13] MULTI-SPEAKER CONVERSATIONS, CROSS-TALK, AND DIARIZATION FOR SPEAKER RECOGNITION
    Sell, Gregory
    McCree, Alan
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5425 - 5429
  • [14] Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Toda, Tomoki
    [J]. IEEE Signal Processing Letters, 2024, 31 : 2995 - 2999
  • [15] Unsupervised Speaker and Expression Factorization for Multi-Speaker Expressive Synthesis of Ebooks
    Chen, Langzhou
    Braunschweiler, Norbert
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1041 - 1045
  • [16] Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion
    Aloradi, Ahmad
    Mack, Wolfgang
    Elminshawi, Mohamed
    Habets, EmanuM A. P.
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 354 - 358
  • [17] INVESTIGATION OF FAST AND EFFICIENT METHODS FOR MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION
    Zheng, Yibin
    Li, Xinhui
    Lu, Li
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6618 - 6622
  • [18] SPEAKER RECOGNITION FOR MULTI-SPEAKER CONVERSATIONS USING X-VECTORS
    Snyder, David
    Garcia-Romero, Daniel
    Sell, Gregory
    McCree, Alan
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5796 - 5800
  • [19] DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding
    Lee, Junmo
    Song, Kwangsub
    Noh, Kyoungjin
    Park, Tae-Jun
    Chang, Joon-Hyuk
    [J]. 2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2019, : 61 - 64
  • [20] Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation
    Mitsui, Kentaro
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    [J]. SPEECH COMMUNICATION, 2021, 132 : 132 - 145