COMPACT NETWORK FOR SPEAKERBEAM TARGET SPEAKER EXTRACTION

被引:0
|
作者
Delcroix, Marc [1 ]
Zmolikova, Katerina [2 ]
Ochiai, Tsubasa [1 ]
Kinoshita, Keisuke [1 ]
Araki, Shoko [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
[2] Brno Univ Technol, Speech FIT & IT4I Ctr Excellence, Brno, Czech Republic
关键词
Target speech extraction; Neural network; Adaptation; Auxiliary feature; Speech enhancement; SPEECH SEPARATION;
D O I
10.1109/icassp.2019.8683087
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech separation that separates a mixture of speech signals into each of its sources has been an active research topic for a long time and has seen recent progress with the advent of deep learning. A related problem is target speaker extraction, i. e. extraction of only speech of a target speaker out of a mixture, given characteristics of his/ her voice. We have recently proposed SpeakerBeam, which is a neural network-based target speaker extraction method. SpeakerBeam uses a speech extraction network that is adapted to the target speaker using auxiliary features derived from an adaptation utterance of that speaker. Initially, we implemented SpeakerBeam with a factorized adaptation layer, which consists of several parallel linear transformations weighted by weights derived from the auxiliary features. The factorized layer is effective for target speech extraction, but it requires a large number of parameters. In this paper, we propose to simply scale the activations of a hidden layer of the speech extraction network with weights derived from the auxiliary features. This simpler approach greatly reduces the number of model parameters by up to 60%, making it much more practical, while maintaining a similar level of performance. We tested our approach on simulated and real noisy and reverberant mixtures, showing the potential of SpeakerBeam for real-life applications. Moreover, we showed that speech extraction performance of SpeakerBeam compares favorably with that of a state-of-the-art speech separation method with a similar network configuration.
引用
收藏
页码:6965 / 6969
页数:5
相关论文
共 50 条
  • [31] MUSE: MULTI-MODAL TARGET SPEAKER EXTRACTION WITH VISUAL CUES
    Pan, Zexu
    Tao, Ruijie
    Xu, Chenglin
    Li, Haizhou
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6678 - 6682
  • [32] Lightweight target speaker separation network based on joint training
    Jing Wang
    Hanyue Liu
    Liang Xu
    Wenjing Yang
    Weiming Yi
    Fang Liu
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [33] A Target Speaker Separation Neural Network with Joint-Training
    Yang, Wenjing
    Wang, Jing
    Li, Hongfeng
    Xu, Na
    Xiang, Fei
    Qian, Kai
    Hu, Shenghua
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 614 - 618
  • [34] Lightweight target speaker separation network based on joint training
    Wang, Jing
    Liu, Hanyue
    Xu, Liang
    Yang, Wenjing
    Yi, Weiming
    Liu, Fang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [35] Characterization Vector Extraction Using Neural Network for Speaker Recognition
    Wang, Wenchao
    Yuan, Qingsheng
    Zhou, Ruohua
    Yan, Yonghong
    2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL. 1, 2016, : 355 - 358
  • [36] Speaker extraction network with attention mechanism for speech dialogue system
    Hao, Yun
    Wu, Jiaju
    Huang, Xiangkang
    Zhang, Zijia
    Liu, Fei
    Wu, Qingyao
    SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2022, 16 (02) : 111 - 119
  • [37] Speaker extraction network with attention mechanism for speech dialogue system
    Yun Hao
    Jiaju Wu
    Xiangkang Huang
    Zijia Zhang
    Fei Liu
    Qingyao Wu
    Service Oriented Computing and Applications, 2022, 16 : 111 - 119
  • [38] SpEx plus : A Complete Time Domain Speaker Extraction Network
    Ge, Meng
    Xu, Chenglin
    Wang, Longbiao
    Chng, Eng Siong
    Dang, Jianwu
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 1406 - 1410
  • [39] Robust Speaker Extraction Network based on Iterative Refined Adaptation
    Deng, Chengyun
    Ma, Shiqian
    Sha, Yongtao
    Zhang, Yi
    Zhang, Hui
    Song, Hui
    Wang, Fei
    INTERSPEECH 2021, 2021, : 3530 - 3534
  • [40] SPEAKER-CONDITIONING SINGLE-CHANNEL TARGET SPEAKER EXTRACTION USING CONFORMER-BASED ARCHITECTURES
    Sinha, Ragini
    Tammen, Marvin
    Rollwage, Christian
    Doclo, Simon
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,