Lightweight target speaker separation network based on joint training

被引:0
|
作者
Jing Wang
Hanyue Liu
Liang Xu
Wenjing Yang
Weiming Yi
Fang Liu
机构
[1] Beijing Institute of Technology,School of Information and Electronics
[2] Beijing Institute of Technology,Key Laboratory of Language, Cognition and Computation Ministry of Industry and Information Technology, School of Foreign Languages
关键词
Target speaker separation; Lightweight network; Loss function; Joint training;
D O I
暂无
中图分类号
学科分类号
摘要
Target speaker separation aims to separate the speech components of the target speaker from mixed speech and remove extraneous components such as noise. In recent years, deep learning-based speech separation methods have made significant breakthroughs and have gradually become mainstream. However, these existing methods generally face problems with system latency and performance upper limits due to the large model size. To solve these problems, this paper proposes improvements in the network structure and training methods to enhance the model’s performance. A lightweight target speaker separation network based on long-short-term memory (LSTM) is proposed, which can reduce the model size and computational delay while maintaining the separation performance. Based on this, a target speaker separation method based on joint training is proposed to achieve the overall training and optimization of the target speaker separation system. Joint loss functions based on speaker registration and speaker separation are proposed for joint training of the network to further improve the system’s performance. The experimental results show that the lightweight target speaker separation network proposed in this paper has better performance while being lightweight, and joint training of the target speaker separation network with our proposed loss function can further improve the separation performance of the original model.
引用
收藏
相关论文
共 50 条
  • [21] Maritime Target Recognition and Location System Based on Lightweight Neural Network
    Zhao, Xiao
    Chen, Zhenjia
    Wang, Min
    Wang, Jingbo
    ELECTRONICS, 2023, 12 (15)
  • [22] EDITnet: A Lightweight Network for Unsupervised Domain Adaptation in Speaker Verification
    Li, Jingyu
    Liu, Wei
    Lee, Tan
    INTERSPEECH 2022, 2022, : 3694 - 3698
  • [23] Lightweight Target-Aware Attention Learning Network-Based Target Tracking Method
    Zhao, Yanchun
    Zhang, Jiapeng
    Duan, Rui
    Li, Fusheng
    Zhang, Huanlong
    MATHEMATICS, 2022, 10 (13)
  • [24] Training Speaker Enrollment Models by Network Optimization
    Mingote, Victoria
    Miguel, Antonio
    Ortega, Alfonso
    Lleida, Eduardo
    INTERSPEECH 2020, 2020, : 3810 - 3814
  • [25] Joint Deep Neural Network for Single-Channel Speech Separation on Masking-Based Training Targets
    Chen, Peng
    Thien Nguyen, Binh
    Geng, Yuting
    Iwai, Kenta
    Nishiura, Takanobu
    IEEE Access, 2024, 12 : 152036 - 152044
  • [26] SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
    Zmolikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Ochiai, Tsubasa
    Nakatani, Tomohiro
    Burget, Lukas
    Cernocky, Jan
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 800 - 814
  • [27] Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation
    Li, Sheng
    Lu, Xugang
    Akita, Yuya
    Kawahara, Tatsuya
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2892 - 2896
  • [28] JOINT SINGLE-CHANNEL SPEECH SEPARATION AND SPEAKER IDENTIFICATION
    Mowlaee, P.
    Saeidi, R.
    Tan, Z. -H.
    Christensen, M. G.
    Franti, P.
    Jensen, S. H.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4430 - 4433
  • [29] An EM Algorithm for Joint Dual-Speaker Separation and Dereverberation
    Cohen, Nili
    Hazan, Gershon
    Schwartz, Boaz
    Gannot, Sharon
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [30] Separate-to-Recognize: Joint Multi-target Speech Separation and Speech Recognition for Speaker-attributed ASR
    Lin, Yuxiao
    Du, Zhihao
    Zhang, Shiliang
    Yu, Fan
    Zhao, Zhou
    Wu, Fei
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 150 - 154