COMPACT NETWORK FOR SPEAKERBEAM TARGET SPEAKER EXTRACTION

被引:0
|
作者
Delcroix, Marc [1 ]
Zmolikova, Katerina [2 ]
Ochiai, Tsubasa [1 ]
Kinoshita, Keisuke [1 ]
Araki, Shoko [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
[2] Brno Univ Technol, Speech FIT & IT4I Ctr Excellence, Brno, Czech Republic
关键词
Target speech extraction; Neural network; Adaptation; Auxiliary feature; Speech enhancement; SPEECH SEPARATION;
D O I
10.1109/icassp.2019.8683087
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech separation that separates a mixture of speech signals into each of its sources has been an active research topic for a long time and has seen recent progress with the advent of deep learning. A related problem is target speaker extraction, i. e. extraction of only speech of a target speaker out of a mixture, given characteristics of his/ her voice. We have recently proposed SpeakerBeam, which is a neural network-based target speaker extraction method. SpeakerBeam uses a speech extraction network that is adapted to the target speaker using auxiliary features derived from an adaptation utterance of that speaker. Initially, we implemented SpeakerBeam with a factorized adaptation layer, which consists of several parallel linear transformations weighted by weights derived from the auxiliary features. The factorized layer is effective for target speech extraction, but it requires a large number of parameters. In this paper, we propose to simply scale the activations of a hidden layer of the speech extraction network with weights derived from the auxiliary features. This simpler approach greatly reduces the number of model parameters by up to 60%, making it much more practical, while maintaining a similar level of performance. We tested our approach on simulated and real noisy and reverberant mixtures, showing the potential of SpeakerBeam for real-life applications. Moreover, we showed that speech extraction performance of SpeakerBeam compares favorably with that of a state-of-the-art speech separation method with a similar network configuration.
引用
收藏
页码:6965 / 6969
页数:5
相关论文
共 50 条
  • [1] SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
    Zmolikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Ochiai, Tsubasa
    Nakatani, Tomohiro
    Burget, Lukas
    Cernocky, Jan
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 800 - 814
  • [2] IMPROVING SPEAKER DISCRIMINATION OF TARGET SPEECH EXTRACTION WITH TIME-DOMAIN SPEAKERBEAM
    Delcroix, Marc
    Ochiai, Tsubasa
    Zmolikova, Katerina
    Kinoshita, Keisuke
    Tawara, Naohiro
    Nakatani, Tomohiro
    Araki, Shoko
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 691 - 695
  • [3] Multimodal SpeakerBeam: Single channel target speech extraction with audio-visual speaker clues
    Ochiai, Tsubasa
    Delcroix, Marc
    Kinoshita, Keisuke
    Ogawa, Atsunori
    Nakatani, Tomohiro
    [J]. INTERSPEECH 2019, 2019, : 2718 - 2722
  • [4] An Electroglottograph Auxiliary Neural Network for Target Speaker Extraction
    Chen, Lijiang
    Mo, Zhendong
    Ren, Jie
    Cui, Chunfeng
    Zhao, Qi
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (01):
  • [5] TEnet: target speaker extraction network with accumulated speaker embedding for automatic speech recognition
    Li, Wenjie
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. ELECTRONICS LETTERS, 2019, 55 (14) : 816 - 818
  • [6] Hierarchic Temporal Convolutional Network with Attention Fusion for Target Speaker Extraction
    Chen, Zihao
    Qiu, Wenbo
    Xu, Haitao
    Hu, Ying
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 827 - 832
  • [7] Target Speaker Extraction Using Attention-Enhanced Temporal Convolutional Network
    Wang, Jian-Hong
    Lai, Yen-Ting
    Tai, Tzu-Chiang
    Le, Phuong Thi
    Pham, Tuan
    Wang, Ze-Yu
    Li, Yung-Hui
    Wang, Jia-Ching
    Chang, Pao-Chi
    Botzheim, Janos
    [J]. ELECTRONICS, 2024, 13 (02)
  • [8] Gated Convolutional Fusion for Time-Domain Target Speaker Extraction Network
    Liu, Wenjing
    Xie, Chuan
    [J]. INTERSPEECH 2022, 2022, : 5368 - 5372
  • [9] Target Speaker Extraction for Multi-Talker Speaker Verification
    Rao, Wei
    Xu, Chenglin
    Chng, Eng Siong
    Li, Haizhou
    [J]. INTERSPEECH 2019, 2019, : 1273 - 1277
  • [10] SINGLE CHANNEL TARGET SPEAKER EXTRACTION AND RECOGNITION WITH SPEAKER BEAM
    Delcroix, Marc
    Zmolikova, Katerina
    Kinoshita, Keisuke
    Ogawa, Atsunori
    Nakatani, Tomohiro
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5554 - 5558