LEARNING SPEAKER REPRESENTATION FOR NEURAL NETWORK BASED MULTICHANNEL SPEAKER EXTRACTION

被引:0
|
作者
Zmolikova, Katerina [1 ,2 ]
Delcroix, Marc [1 ]
Kinoshita, Keisuke [1 ]
Higuchi, Takuya [1 ]
Ogawa, Atsunori [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
[2] Brno Univ Technol, Speech FIT, Brno, Czech Republic
关键词
speaker extraction; speaker adaptive neural network; multi-speaker speech recognition; speaker representation learning; beamforming; SOURCE SEPARATION; SPEECH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, schemes employing deep neural networks (DNNs) for extracting speech from noisy observation have demonstrated great potential for noise robust automatic speech recognition. However, these schemes are not well suited when the interfering noise is another speaker. To enable extracting a target speaker from a mixture of speakers, we have recently proposed to inform the neural network using speaker information extracted from an adaptation utterance from the same speaker. In our previous work, we explored ways how to inform the network about the speaker and found a speaker adaptive layer approach to be suitable for this task. In our experiments, we used speaker features designed for speaker recognition tasks as the additional speaker information, which may not be optimal for the speaker extraction task. In this paper, we propose a usage of a sequence summarizing scheme enabling to learn the speaker representation jointly with the network. Furthermore, we extend the previous experiments to demonstrate the potential of our proposed method as a front-end for speech recognition and explore the effect of additional noise on the performance of the method.
引用
收藏
页码:8 / 15
页数:8
相关论文
共 50 条
  • [1] DEEP NEURAL NETWORK TRAINED WITH SPEAKER REPRESENTATION FOR SPEAKER NORMALIZATION
    Tang, Yun
    Mohan, Aanchan
    Rose, Richard C.
    Ma, Chengyuan
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] Speaker-aware neural network based beamformer for speaker extraction in speech mixtures
    Zmplikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Higuchi, Takuya
    Ogawa, Atsunori
    Nakatani, Tomohiro
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2655 - 2659
  • [3] SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
    Zmolikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Ochiai, Tsubasa
    Nakatani, Tomohiro
    Burget, Lukas
    Cernocky, Jan
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 800 - 814
  • [4] Neural Speaker Extraction with Speaker-Speech Cross-Attention Network
    Wang, Wupeng
    Xu, Chenglin
    Ge, Meng
    Li, Haizhou
    [J]. INTERSPEECH 2021, 2021, : 3535 - 3539
  • [5] Memory Storable Network Based Feature Aggregation for Speaker Representation Learning
    Gu, Bin
    Guo, Wu
    Zhang, Jie
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 643 - 655
  • [6] ATTENTION-BASED NEURAL NETWORK FOR JOINT DIARIZATION AND SPEAKER EXTRACTION
    Chazan, Shlomo E.
    Gannot, Sharon
    Goldberger, Jacob
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 301 - 305
  • [7] An Electroglottograph Auxiliary Neural Network for Target Speaker Extraction
    Chen, Lijiang
    Mo, Zhendong
    Ren, Jie
    Cui, Chunfeng
    Zhao, Qi
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (01):
  • [8] Speaker Recognition Based on Quantum Neural Network
    Wang, Geng
    Wang, Jin Ming
    Sun, Jian
    [J]. 2ND INTERNATIONAL SYMPOSIUM ON COMPUTER NETWORK AND MULTIMEDIA TECHNOLOGY (CNMT 2010), VOLS 1 AND 2, 2010, : 238 - 241
  • [9] Characterization Vector Extraction Using Neural Network for Speaker Recognition
    Wang, Wenchao
    Yuan, Qingsheng
    Zhou, Ruohua
    Yan, Yonghong
    [J]. 2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL. 1, 2016, : 355 - 358
  • [10] Speaker Adaptive Bottleneck Features Extraction for LVCSR Based on Discriminative Learning of Speaker Codes
    Kong, Changqing
    Xue, Shaofei
    Gao, Jianqing
    Guo, Wu
    Dai, Lirong
    Jiang, Hui
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 83 - +