Speaker extraction network with attention mechanism for speech dialogue system

被引:1
|
作者
Hao, Yun [1 ]
Wu, Jiaju [1 ]
Huang, Xiangkang [1 ]
Zhang, Zijia [1 ]
Liu, Fei [1 ]
Wu, Qingyao [1 ,2 ]
机构
[1] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China
[2] Pazhou Lab, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech dialogue system; Speech separation; Multi-task; Attention; SEPARATION; ENHANCEMENT;
D O I
10.1007/s11761-022-00340-w
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Speech Dialogue System is currently widely used in various fields. Users can interact and communicate with the system through natural language. While in practical situations, there exist third-person background sounds and background noise interference in real dialogue scenes. This issue seriously damages the intelligibility of the speech signal and decreases speech recognition performance. To tackle this, in this paper, we exploit a speech separation method that can help us to separate target speech from complex multi-person speech. We propose a multi-task-attention mechanism, and we select TFCN as our audio feature extraction module. Based on the multi-task method, we use SI-SDR and cross-entropy speaker classification loss function for joint training, and then we use the attention mechanism to further excludes the background vocals in the mixed speech. We not only test our result in Distortion indicators SI-SDR and SDR, but also test with a speech recognition system. To train our model and demonstrate its effectiveness, we build a background vocal removal data set based on a common data set. Experimental results empirically show that our model significantly improves the performance of speech separation model.
引用
收藏
页码:111 / 119
页数:9
相关论文
共 50 条
  • [41] Event Temporal Relation Extraction with Attention Mechanism and Graph Neural Network
    Xiaoliang Xu
    Tong Gao
    Yuxiang Wang
    Xinle Xuan
    Tsinghua Science and Technology, 2022, 27 (01) : 79 - 90
  • [42] ATTENTION MECHANISM IN SPEAKER RECOGNITION: WHAT DOES IT LEARN IN DEEP SPEAKER EMBEDDING?
    Wang, Qiongqiong
    Okabe, Koji
    Lee, Kong Aik
    Yamamoto, Hitoshi
    Koshinaka, Takafumi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1052 - 1059
  • [43] Enhancing Dialogue-based Relation Extraction by Speaker and TriggerWords Prediction
    Zhao, Tianyang
    Yan, Zhao
    Cao, Yunbo
    Li, Zhoujun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4580 - 4585
  • [44] LEARNING SPEAKER REPRESENTATION FOR NEURAL NETWORK BASED MULTICHANNEL SPEAKER EXTRACTION
    Zmolikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Higuchi, Takuya
    Ogawa, Atsunori
    Nakatani, Tomohiro
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 8 - 15
  • [45] Target Speech Extraction: Independent Vector Extraction Guided by Supervised Speaker Identification
    Malek, Jiri
    Jansky, Jakub
    Koldovsky, Zbynek
    Kounovsky, Tomas
    Cmejla, Jaroslav
    Zdansky, Jindrich
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2295 - 2309
  • [46] Speech-dialogue system for vehicles
    Heinrich, C
    Stammler, W
    ELECTRONIC SYSTEMS FOR VEHICLES, 1996, 1287 : 425 - 441
  • [47] ONLINE SPEAKER ADAPTATION FOR LVCSR BASED ON ATTENTION MECHANISM
    Pan, Jia
    Liu, Diyuan
    Wan, Genshun
    Du, Jun
    Liu, Qingfeng
    Ye, Zhongfu
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 183 - 186
  • [48] Position-Aware Attention Mechanism-Based Bi-graph for Dialogue Relation Extraction
    Duan, Guiduo
    Dong, Yunrui
    Miao, Jiayu
    Huang, Tianxi
    COGNITIVE COMPUTATION, 2023, 15 (01) : 359 - 372
  • [49] COMPACT NETWORK FOR SPEAKERBEAM TARGET SPEAKER EXTRACTION
    Delcroix, Marc
    Zmolikova, Katerina
    Ochiai, Tsubasa
    Kinoshita, Keisuke
    Araki, Shoko
    Nakatani, Tomohiro
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6965 - 6969
  • [50] TIME-DOMAIN SPEAKER EXTRACTION NETWORK
    Xu, Chenglin
    Rao, Wei
    Chng, Eng Siong
    Li, Haizhou
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 327 - 334