Speaker extraction network with attention mechanism for speech dialogue system

被引：1

作者：

Hao, Yun ^{[1
]}

Wu, Jiaju ^{[1
]}

Huang, Xiangkang ^{[1
]}

Zhang, Zijia ^{[1
]}

Liu, Fei ^{[1
]}

Wu, Qingyao ^{[1
,2
]}

机构：

[1] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China

[2] Pazhou Lab, Guangzhou, Peoples R China

来源：

SERVICE ORIENTED COMPUTING AND APPLICATIONS | 2022年 / 16卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Speech dialogue system; Speech separation; Multi-task; Attention; SEPARATION; ENHANCEMENT;

D O I：

10.1007/s11761-022-00340-w

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Speech Dialogue System is currently widely used in various fields. Users can interact and communicate with the system through natural language. While in practical situations, there exist third-person background sounds and background noise interference in real dialogue scenes. This issue seriously damages the intelligibility of the speech signal and decreases speech recognition performance. To tackle this, in this paper, we exploit a speech separation method that can help us to separate target speech from complex multi-person speech. We propose a multi-task-attention mechanism, and we select TFCN as our audio feature extraction module. Based on the multi-task method, we use SI-SDR and cross-entropy speaker classification loss function for joint training, and then we use the attention mechanism to further excludes the background vocals in the mixed speech. We not only test our result in Distortion indicators SI-SDR and SDR, but also test with a speech recognition system. To train our model and demonstrate its effectiveness, we build a background vocal removal data set based on a common data set. Experimental results empirically show that our model significantly improves the performance of speech separation model.

引用

页码：111 / 119

页数：9

共 50 条

[1] Speaker extraction network with attention mechanism for speech dialogue system
Yun Hao
Jiaju Wu
Xiangkang Huang
Zijia Zhang
Fei Liu
Qingyao Wu
Service Oriented Computing and Applications, 2022, 16 : 111 - 119
[2] Denoi-SpEx plus : A Speaker Extraction Network based Speech Dialogue System
Hao, Yun
Huang, Xiangkang
Huang, Huichou
Wu, Qingyao
2021 IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE 2021), 2021, : 49 - 53
[3] Neural Speaker Extraction with Speaker-Speech Cross-Attention Network
Wang, Wupeng
Xu, Chenglin
Ge, Meng
Li, Haizhou
INTERSPEECH 2021, 2021, : 3535 - 3539
[4] SINGLE-CHANNEL SPEECH EXTRACTION USING SPEAKER INVENTORY AND ATTENTION NETWORK
Xiao, Xiong
Chen, Zhuo
Yoshioka, Takuya
Erdogan, Hakan
Liu, Changliang
Dimitriadis, Dimitrios
Droppo, Jasha
Gong, Yifan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 86 - 90
[5] SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
Zmolikova, Katerina
Delcroix, Marc
Kinoshita, Keisuke
Ochiai, Tsubasa
Nakatani, Tomohiro
Burget, Lukas
Cernocky, Jan
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 800 - 814
[6] Target Speaker Extraction with Attention Enhancement and Gated Fusion Mechanism
Wang Sijie
Hamdulla, Askar
Ablimit, Mijit
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1995 - 2001
[7] Speaker Adaptive Training for Speech Recognition Based on Attention-over-Attention Mechanism
Wan, Genshun
Pan, Jia
Wang, Qingran
Gao, Jianqing
Ye, Zhongfu
INTERSPEECH 2020, 2020, : 1251 - 1255
[8] Speaker-aware neural network based beamformer for speaker extraction in speech mixtures
Zmplikova, Katerina
Delcroix, Marc
Kinoshita, Keisuke
Higuchi, Takuya
Ogawa, Atsunori
Nakatani, Tomohiro
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2655 - 2659
[9] TEnet: target speaker extraction network with accumulated speaker embedding for automatic speech recognition
Li, Wenjie
Zhang, Pengyuan
Yan, Yonghong
ELECTRONICS LETTERS, 2019, 55 (14) : 816 - 818
[10] Hierarchic Temporal Convolutional Network with Attention Fusion for Target Speaker Extraction
Chen, Zihao
Qiu, Wenbo
Xu, Haitao
Hu, Ying
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 827 - 832

← 1 2 3 4 5 →