A multi-task learning and auditory attention detection framework towards EEG-assisted target speech extraction

被引:0
|
作者
Wang, Xuefei [1 ]
Ding, Yuting [1 ]
Wang, Lei [2 ]
Chen, Fei [1 ,3 ]
机构
[1] Southern Univ Sci & Technol, Dept Elect & Elect Engn, Shenzhen, Peoples R China
[2] GuangZhou Univ, Sch Elect & Commun Engn, Guangzhou, Peoples R China
[3] Minist Nat Resources, Key Lab Urban Land Resources Monitoring & Simulat, Shenzhen, Peoples R China
关键词
Target speech extraction; EEG auditory attention detection; Multi-task learning; Symmetrically combined cross-attention; SPEAKER EXTRACTION; NEURAL-NETWORKS; SEPARATION;
D O I
10.1016/j.apacoust.2024.110474
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In complex multi-speaker environments, effectively separating the target speaker's voice remains a significant challenge. Traditional methods often rely on pre-recorded clean speech as a reference, which performs well in controlled settings but has limitations in more diverse conditions. Recently, the use of deep learning techniques has advanced speech separation technology, but limitations remain, especially when dealing with multiple speakers and complex background noise. In this study, we propose a novel EEG-assisted target speech extraction network based on multi-task learning, named MLAD-ETSEN. The approach combines EEG auditory attention detection with speech separation techniques. This method is based on the classical Conv-TasNet backbone network and utilizes joint training to extract speech and EEG embeddings from speech mixtures and EEG trials, respectively. To take full advantage of these embeddings, a symmetrically combined cross-attention module is designed to deeply integrate these features, which are then fed into a decoder to reconstruct the final speech waveform. The experimental results show that the model using the symmetrically combined cross-attention module for information fusion, under multi-task training with EEG auditory attention detection significantly outperforms existing baseline methods. Compared to the baseline model, the proposed MLAD-ETSEN model achieved a relative improvement of 15.70% in SI_SDR across all subjects. The STOI and PESQ scores improved by an average of 0.08 and 0.55, respectively, compared to the assessment before separation. The proposed EEG assisted target speech extraction method exhibits its potential in accurately identifying and separating the target speech in multi-speaker environments.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] A Multi-Task Learning Framework for Head Pose Estimation under Target Motion
    Yan, Yan
    Ricci, Elisa
    Subramanian, Ramanathan
    Liu, Gaowen
    Lanz, Oswald
    Sebe, Nicu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (06) : 1070 - 1083
  • [22] Long Text Summarization and Key Information Extraction in a Multi-Task Learning Framework
    Lu M.
    Chen R.
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [23] A Multi-Task Learning Approach to Hate Speech Detection Leveraging Sentiment Analysis
    Plaza-Del-Arco, Flor Miriam
    Molina-Gonzalez, M. Dolores
    Urena-Lopez, L. Alfonso
    Martin-Valdivia, Maria Teresa
    IEEE ACCESS, 2021, 9 : 112478 - 112489
  • [24] Multi-Task Learning for Mispronunciation Detection on Singapore Children's Mandarin Speech
    Tong, Rong
    Chen, Nancy E.
    Ma, Bin
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2193 - 2197
  • [25] Drug Target Interaction Prediction using Multi-task Learning and Co-attention
    Weng, Yuyou
    Lin, Chen
    Zeng, Xiangxiang
    Liang, Yun
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 528 - 533
  • [26] Strawberry Verticillium Wilt Detection Network Based on Multi-Task Learning and Attention
    Nie, Xuan
    Wang, Luyao
    Ding, Haoxuan
    Xu, Min
    IEEE ACCESS, 2019, 7 : 170003 - 170011
  • [27] A multi-task learning approach to hate speech detection leveraging sentiment analysis
    Plaza-Del-Arco, Flor Miriam
    Molina-Gonzalez, M. Dolores
    Urena-Lopez, L. Alfonso
    Martin-Valdivia, Maria Teresa
    IEEE Access, 2021, 9 : 112478 - 112489
  • [28] Rumor Detection By Exploiting User Credibility Information, Attention and Multi-task Learning
    Li, Quanzhi
    Zhang, Qiong
    Si, Luo
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1173 - 1179
  • [29] Robotic grasp detection algorithm integrating attention mechanism and multi-task learning
    Li Y.
    Liang X.
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2023, 55 (12): : 9 - 17
  • [30] Grape Disease Detection Network Based on Multi-Task Learning and Attention Features
    Dwivedi, Rudresh
    Dey, Somnath
    Chakraborty, Chinmay
    Tiwari, Sanju
    IEEE SENSORS JOURNAL, 2021, 21 (16) : 17573 - 17580