A multi-task learning and auditory attention detection framework towards EEG-assisted target speech extraction

被引:0
|
作者
Wang, Xuefei [1 ]
Ding, Yuting [1 ]
Wang, Lei [2 ]
Chen, Fei [1 ,3 ]
机构
[1] Southern Univ Sci & Technol, Dept Elect & Elect Engn, Shenzhen, Peoples R China
[2] GuangZhou Univ, Sch Elect & Commun Engn, Guangzhou, Peoples R China
[3] Minist Nat Resources, Key Lab Urban Land Resources Monitoring & Simulat, Shenzhen, Peoples R China
关键词
Target speech extraction; EEG auditory attention detection; Multi-task learning; Symmetrically combined cross-attention; SPEAKER EXTRACTION; NEURAL-NETWORKS; SEPARATION;
D O I
10.1016/j.apacoust.2024.110474
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In complex multi-speaker environments, effectively separating the target speaker's voice remains a significant challenge. Traditional methods often rely on pre-recorded clean speech as a reference, which performs well in controlled settings but has limitations in more diverse conditions. Recently, the use of deep learning techniques has advanced speech separation technology, but limitations remain, especially when dealing with multiple speakers and complex background noise. In this study, we propose a novel EEG-assisted target speech extraction network based on multi-task learning, named MLAD-ETSEN. The approach combines EEG auditory attention detection with speech separation techniques. This method is based on the classical Conv-TasNet backbone network and utilizes joint training to extract speech and EEG embeddings from speech mixtures and EEG trials, respectively. To take full advantage of these embeddings, a symmetrically combined cross-attention module is designed to deeply integrate these features, which are then fed into a decoder to reconstruct the final speech waveform. The experimental results show that the model using the symmetrically combined cross-attention module for information fusion, under multi-task training with EEG auditory attention detection significantly outperforms existing baseline methods. Compared to the baseline model, the proposed MLAD-ETSEN model achieved a relative improvement of 15.70% in SI_SDR across all subjects. The STOI and PESQ scores improved by an average of 0.08 and 0.55, respectively, compared to the assessment before separation. The proposed EEG assisted target speech extraction method exhibits its potential in accurately identifying and separating the target speech in multi-speaker environments.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Emotion recognition from EEG based on multi-task learning with capsule network and attention mechanism
    Li, Chang
    Wang, Bin
    Zhang, Silin
    Liu, Yu
    Song, Rencheng
    Cheng, Juan
    Chen, Xun
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 143
  • [32] A GENERAL MULTI-TASK LEARNING FRAMEWORK TO LEVERAGE TEXT DATA FOR SPEECH TO TEXT TASKS
    Tang, Yun
    Pino, Juan
    Wang, Changhan
    Ma, Xutai
    Genzel, Dmitriy
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6209 - 6213
  • [33] Predicting Auditory Spatial Attention from EEG using Single- and Multi-task Convolutional Neural Networks
    Liu, Zhentao
    Mock, Jeffrey
    Huang, Yufei
    Golob, Edward
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 1298 - 1303
  • [34] INFRARED SMALL TARGET DETECTION BASED ON SALIENCY GUIDED MULTI-TASK LEARNING
    Liu, Zhaoying
    He, Junran
    Zhang, Yuxiang
    Zhang, Ting
    Han, Ziqing
    Liu, Bo
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3459 - 3463
  • [35] HARadNet: Anchor-free target detection for radar point clouds using hierarchical attention and multi-task learning
    Dubey, Anand
    Santra, Avik
    Fuchs, Jonas
    Luebke, Maximilian
    Weigel, Robert
    Lurz, Fabian
    MACHINE LEARNING WITH APPLICATIONS, 2022, 8
  • [36] A multi-task learning framework for end-to-end aspect sentiment triplet extraction
    Chen, Fang
    Yang, Zhongliang
    Huang, Yongfeng
    NEUROCOMPUTING, 2022, 479 : 12 - 21
  • [37] CMBEE: A constraint-based multi-task learning framework for biomedical event extraction
    Hu, Jingyue
    Tang, Buzhou
    Lyu, Nan
    He, Yuxin
    Xiong, Ying
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 150
  • [38] Effectiveness of multi-task deep learning framework for EEG-based emotion and context recognition
    Choo, Sanghyun
    Park, Hoonseok
    Kim, Sangyeon
    Park, Donghyun
    Jung, Jae-Yoon
    Lee, Sangwon
    Nam, Chang S.
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 227
  • [39] A deep neural network based multi-task learning approach to hate speech detection
    Kapil, Prashant
    Ekbal, Asif
    KNOWLEDGE-BASED SYSTEMS, 2020, 210 (210)
  • [40] Multi-task SonoEyeNet: Detection of Fetal Standardized Planes Assisted by Generated Sonographer Attention Maps
    Cai, Yifan
    Sharma, Harshita
    Chatelain, Pierre
    Noble, J. Alison
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2018, PT I, 2018, 11070 : 871 - 879