COMPACT NETWORK FOR SPEAKERBEAM TARGET SPEAKER EXTRACTION

被引:0
|
作者
Delcroix, Marc [1 ]
Zmolikova, Katerina [2 ]
Ochiai, Tsubasa [1 ]
Kinoshita, Keisuke [1 ]
Araki, Shoko [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
[2] Brno Univ Technol, Speech FIT & IT4I Ctr Excellence, Brno, Czech Republic
关键词
Target speech extraction; Neural network; Adaptation; Auxiliary feature; Speech enhancement; SPEECH SEPARATION;
D O I
10.1109/icassp.2019.8683087
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech separation that separates a mixture of speech signals into each of its sources has been an active research topic for a long time and has seen recent progress with the advent of deep learning. A related problem is target speaker extraction, i. e. extraction of only speech of a target speaker out of a mixture, given characteristics of his/ her voice. We have recently proposed SpeakerBeam, which is a neural network-based target speaker extraction method. SpeakerBeam uses a speech extraction network that is adapted to the target speaker using auxiliary features derived from an adaptation utterance of that speaker. Initially, we implemented SpeakerBeam with a factorized adaptation layer, which consists of several parallel linear transformations weighted by weights derived from the auxiliary features. The factorized layer is effective for target speech extraction, but it requires a large number of parameters. In this paper, we propose to simply scale the activations of a hidden layer of the speech extraction network with weights derived from the auxiliary features. This simpler approach greatly reduces the number of model parameters by up to 60%, making it much more practical, while maintaining a similar level of performance. We tested our approach on simulated and real noisy and reverberant mixtures, showing the potential of SpeakerBeam for real-life applications. Moreover, we showed that speech extraction performance of SpeakerBeam compares favorably with that of a state-of-the-art speech separation method with a similar network configuration.
引用
下载
收藏
页码:6965 / 6969
页数:5
相关论文
共 50 条
  • [41] An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer
    Shi, Jiatong
    Zhang, Chunlei
    Weng, Chao
    Watanabe, Shinji
    Yu, Meng
    Yu, Dong
    COMPUTER SPEECH AND LANGUAGE, 2022, 73
  • [42] SPEAKER REINFORCEMENT USING TARGET SOURCE EXTRACTION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Zorila, Catalin
    Doddipatla, Rama
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6297 - 6301
  • [43] Coarse-to-Fine Target Speaker Extraction Based on Contextual Information Exploitation
    Yang, Xue
    Bao, Changchun
    Chen, Xianhong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3795 - 3810
  • [44] AN INVESTIGATION INTO THE MULTI-CHANNEL TIME DOMAIN SPEAKER EXTRACTION NETWORK
    Zorila, Catalin
    Li, Mohan
    Doddipatla, Rama
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 793 - 800
  • [45] SpEx: Multi-Scale Time Domain Speaker Extraction Network
    Xu, Chenglin
    Rao, Wei
    Chng, Eng Siong
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1370 - 1384
  • [46] ATTENTION-BASED NEURAL NETWORK FOR JOINT DIARIZATION AND SPEAKER EXTRACTION
    Chazan, Shlomo E.
    Gannot, Sharon
    Goldberger, Jacob
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 301 - 305
  • [47] A network model of speaker identification with new feature extraction methods and BLSTM
    Wang, Xingmei
    Xue, Fuzhao
    Wang, Wei
    Liu, Anhua
    NEUROCOMPUTING, 2020, 403 (403) : 167 - 181
  • [48] Statistical Compact Model Extraction: A Neural Network Approach
    Viraraghavan, Janakiraman
    Pandharpure, Shrinivas J.
    Watts, Josef
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2012, 31 (12) : 1920 - 1924
  • [49] Opinion Target Network and Bootstrapping Method for Chinese Opinion Target Extraction
    Xia, Yunqing
    Hao, Boyi
    Wong, Kam-Fai
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 339 - +
  • [50] Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers
    Borsdorf, Marvin
    Xu, Chenglin
    Li, Haizhou
    Schultz, Tanja
    INTERSPEECH 2021, 2021, : 1469 - 1473