Lightweight target speaker separation network based on joint training

被引：0

作者：

Jing Wang

Hanyue Liu

Liang Xu

Wenjing Yang

Weiming Yi

Fang Liu

机构：

[1] Beijing Institute of Technology,School of Information and Electronics

[2] Beijing Institute of Technology,Key Laboratory of Language, Cognition and Computation Ministry of Industry and Information Technology, School of Foreign Languages

来源：

EURASIP Journal on Audio, Speech, and Music Processing | / 2023卷

关键词：

Target speaker separation; Lightweight network; Loss function; Joint training;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Target speaker separation aims to separate the speech components of the target speaker from mixed speech and remove extraneous components such as noise. In recent years, deep learning-based speech separation methods have made significant breakthroughs and have gradually become mainstream. However, these existing methods generally face problems with system latency and performance upper limits due to the large model size. To solve these problems, this paper proposes improvements in the network structure and training methods to enhance the model’s performance. A lightweight target speaker separation network based on long-short-term memory (LSTM) is proposed, which can reduce the model size and computational delay while maintaining the separation performance. Based on this, a target speaker separation method based on joint training is proposed to achieve the overall training and optimization of the target speaker separation system. Joint loss functions based on speaker registration and speaker separation are proposed for joint training of the network to further improve the system’s performance. The experimental results show that the lightweight target speaker separation network proposed in this paper has better performance while being lightweight, and joint training of the target speaker separation network with our proposed loss function can further improve the separation performance of the original model.

引用

共 50 条

[21] Maritime Target Recognition and Location System Based on Lightweight Neural Network
Zhao, Xiao
Chen, Zhenjia
Wang, Min
Wang, Jingbo
ELECTRONICS, 2023, 12 (15)
[22] EDITnet: A Lightweight Network for Unsupervised Domain Adaptation in Speaker Verification
Li, Jingyu
Liu, Wei
Lee, Tan
INTERSPEECH 2022, 2022, : 3694 - 3698
[23] Lightweight Target-Aware Attention Learning Network-Based Target Tracking Method
Zhao, Yanchun
Zhang, Jiapeng
Duan, Rui
Li, Fusheng
Zhang, Huanlong
MATHEMATICS, 2022, 10 (13)
[24] Training Speaker Enrollment Models by Network Optimization
Mingote, Victoria
Miguel, Antonio
Ortega, Alfonso
Lleida, Eduardo
INTERSPEECH 2020, 2020, : 3810 - 3814
[25] Joint Deep Neural Network for Single-Channel Speech Separation on Masking-Based Training Targets
Chen, Peng
Thien Nguyen, Binh
Geng, Yuting
Iwai, Kenta
Nishiura, Takanobu
IEEE Access, 2024, 12 : 152036 - 152044
[26] SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
Zmolikova, Katerina
Delcroix, Marc
Kinoshita, Keisuke
Ochiai, Tsubasa
Nakatani, Tomohiro
Burget, Lukas
Cernocky, Jan
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 800 - 814
[27] Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation
Li, Sheng
Lu, Xugang
Akita, Yuya
Kawahara, Tatsuya
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2892 - 2896
[28] JOINT SINGLE-CHANNEL SPEECH SEPARATION AND SPEAKER IDENTIFICATION
Mowlaee, P.
Saeidi, R.
Tan, Z. -H.
Christensen, M. G.
Franti, P.
Jensen, S. H.
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4430 - 4433
[29] An EM Algorithm for Joint Dual-Speaker Separation and Dereverberation
Cohen, Nili
Hazan, Gershon
Schwartz, Boaz
Gannot, Sharon
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[30] Separate-to-Recognize: Joint Multi-target Speech Separation and Speech Recognition for Speaker-attributed ASR
Lin, Yuxiao
Du, Zhihao
Zhang, Shiliang
Yu, Fan
Zhao, Zhou
Wu, Fei
2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 150 - 154

← 1 2 3 4 5 →