COMPACT NETWORK FOR SPEAKERBEAM TARGET SPEAKER EXTRACTION

被引:0
|
作者
Delcroix, Marc [1 ]
Zmolikova, Katerina [2 ]
Ochiai, Tsubasa [1 ]
Kinoshita, Keisuke [1 ]
Araki, Shoko [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
[2] Brno Univ Technol, Speech FIT & IT4I Ctr Excellence, Brno, Czech Republic
关键词
Target speech extraction; Neural network; Adaptation; Auxiliary feature; Speech enhancement; SPEECH SEPARATION;
D O I
10.1109/icassp.2019.8683087
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech separation that separates a mixture of speech signals into each of its sources has been an active research topic for a long time and has seen recent progress with the advent of deep learning. A related problem is target speaker extraction, i. e. extraction of only speech of a target speaker out of a mixture, given characteristics of his/ her voice. We have recently proposed SpeakerBeam, which is a neural network-based target speaker extraction method. SpeakerBeam uses a speech extraction network that is adapted to the target speaker using auxiliary features derived from an adaptation utterance of that speaker. Initially, we implemented SpeakerBeam with a factorized adaptation layer, which consists of several parallel linear transformations weighted by weights derived from the auxiliary features. The factorized layer is effective for target speech extraction, but it requires a large number of parameters. In this paper, we propose to simply scale the activations of a hidden layer of the speech extraction network with weights derived from the auxiliary features. This simpler approach greatly reduces the number of model parameters by up to 60%, making it much more practical, while maintaining a similar level of performance. We tested our approach on simulated and real noisy and reverberant mixtures, showing the potential of SpeakerBeam for real-life applications. Moreover, we showed that speech extraction performance of SpeakerBeam compares favorably with that of a state-of-the-art speech separation method with a similar network configuration.
引用
收藏
页码:6965 / 6969
页数:5
相关论文
共 50 条
  • [21] Variants of LSTM cells for single-channel speaker-conditioned target speaker extraction
    Ragini Sinha
    Christian Rollwage
    Simon Doclo
    EURASIP Journal on Audio, Speech, and Music Processing, 2024 (1)
  • [22] A Pitch-aware Speaker Extraction Serial Network
    Jiang, Yu
    Ge, Meng
    Wang, Longbiao
    Dang, Jianwu
    Honda, Kiyoshi
    Zhang, Sulin
    Yu, Bo
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 616 - 620
  • [23] MULTI-CHANNEL TARGET SPEECH EXTRACTION WITH CHANNEL DECORRELATION AND TARGET SPEAKER ADAPTATION
    Han, Jiangyu
    Zhou, Xinyuan
    Long, Yanhua
    Li, Yijie
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6094 - 6098
  • [24] Speaker-aware neural network based beamformer for speaker extraction in speech mixtures
    Zmplikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Higuchi, Takuya
    Ogawa, Atsunori
    Nakatani, Tomohiro
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2655 - 2659
  • [25] End-to-end SpeakerBeam for single channel target speech recognition
    Delcroix, Marc
    Watanabe, Shinji
    Ochiai, Tsubasa
    Kinoshita, Keisuke
    Karita, Shigeki
    Ogawa, Atsunori
    Nakatani, Tomohiro
    INTERSPEECH 2019, 2019, : 451 - 455
  • [26] Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches
    Zhao, Zifeng
    Yang, Dongchao
    Gu, Rongzhi
    Zhang, Haoran
    Zou, Yuexian
    INTERSPEECH 2022, 2022, : 5333 - 5337
  • [27] Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting
    Shao, Qijie
    Hou, Jingyong
    Hu, Yanxin
    Wang, Qing
    Xie, Lei
    Lei, Xin
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 672 - 678
  • [28] IMPROVING RNN TRANSDUCER WITH TARGET SPEAKER EXTRACTION AND NEURAL UNCERTAINTY ESTIMATION
    Shi, Jiatong
    Zhang, Chunlei
    Weng, Chao
    Watanabe, Shinji
    Yu, Meng
    Yu, Dong
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6908 - 6912
  • [29] Speech extraction of a target speaker from one mixed speech signal
    Azetsu, Tadahiro
    Uchino, Eiji
    Suetake, Noriaki
    IEEJ Transactions on Electronics, Information and Systems, 2007, 127 (06) : 970 - 971
  • [30] Contrastive Learning for Target Speaker Extraction With Attention-Based Fusion
    Li, Xiao
    Liu, Ruirui
    Huang, Huichou
    Wu, Qingyao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 178 - 188