A multi-task learning and auditory attention detection framework towards EEG-assisted target speech extraction

被引：0

作者：

Wang, Xuefei ^{[1
]}

Ding, Yuting ^{[1
]}

Wang, Lei ^{[2
]}

Chen, Fei ^{[1
,3
]}

机构：

[1] Southern Univ Sci & Technol, Dept Elect & Elect Engn, Shenzhen, Peoples R China

[2] GuangZhou Univ, Sch Elect & Commun Engn, Guangzhou, Peoples R China

[3] Minist Nat Resources, Key Lab Urban Land Resources Monitoring & Simulat, Shenzhen, Peoples R China

来源：

APPLIED ACOUSTICS | 2025年 / 231卷

关键词：

Target speech extraction; EEG auditory attention detection; Multi-task learning; Symmetrically combined cross-attention; SPEAKER EXTRACTION; NEURAL-NETWORKS; SEPARATION;

D O I：

10.1016/j.apacoust.2024.110474

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In complex multi-speaker environments, effectively separating the target speaker's voice remains a significant challenge. Traditional methods often rely on pre-recorded clean speech as a reference, which performs well in controlled settings but has limitations in more diverse conditions. Recently, the use of deep learning techniques has advanced speech separation technology, but limitations remain, especially when dealing with multiple speakers and complex background noise. In this study, we propose a novel EEG-assisted target speech extraction network based on multi-task learning, named MLAD-ETSEN. The approach combines EEG auditory attention detection with speech separation techniques. This method is based on the classical Conv-TasNet backbone network and utilizes joint training to extract speech and EEG embeddings from speech mixtures and EEG trials, respectively. To take full advantage of these embeddings, a symmetrically combined cross-attention module is designed to deeply integrate these features, which are then fed into a decoder to reconstruct the final speech waveform. The experimental results show that the model using the symmetrically combined cross-attention module for information fusion, under multi-task training with EEG auditory attention detection significantly outperforms existing baseline methods. Compared to the baseline model, the proposed MLAD-ETSEN model achieved a relative improvement of 15.70% in SI_SDR across all subjects. The STOI and PESQ scores improved by an average of 0.08 and 0.55, respectively, compared to the assessment before separation. The proposed EEG assisted target speech extraction method exhibits its potential in accurately identifying and separating the target speech in multi-speaker environments.

引用

页数：10

共 50 条

[31] Emotion recognition from EEG based on multi-task learning with capsule network and attention mechanism
Li, Chang
Wang, Bin
Zhang, Silin
Liu, Yu
Song, Rencheng
Cheng, Juan
Chen, Xun
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 143
[32] A GENERAL MULTI-TASK LEARNING FRAMEWORK TO LEVERAGE TEXT DATA FOR SPEECH TO TEXT TASKS
Tang, Yun
Pino, Juan
Wang, Changhan
Ma, Xutai
Genzel, Dmitriy
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6209 - 6213
[33] Predicting Auditory Spatial Attention from EEG using Single- and Multi-task Convolutional Neural Networks
Liu, Zhentao
Mock, Jeffrey
Huang, Yufei
Golob, Edward
2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 1298 - 1303
[34] INFRARED SMALL TARGET DETECTION BASED ON SALIENCY GUIDED MULTI-TASK LEARNING
Liu, Zhaoying
He, Junran
Zhang, Yuxiang
Zhang, Ting
Han, Ziqing
Liu, Bo
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3459 - 3463
[35] HARadNet: Anchor-free target detection for radar point clouds using hierarchical attention and multi-task learning
Dubey, Anand
Santra, Avik
Fuchs, Jonas
Luebke, Maximilian
Weigel, Robert
Lurz, Fabian
MACHINE LEARNING WITH APPLICATIONS, 2022, 8
[36] A multi-task learning framework for end-to-end aspect sentiment triplet extraction
Chen, Fang
Yang, Zhongliang
Huang, Yongfeng
NEUROCOMPUTING, 2022, 479 : 12 - 21
[37] CMBEE: A constraint-based multi-task learning framework for biomedical event extraction
Hu, Jingyue
Tang, Buzhou
Lyu, Nan
He, Yuxin
Xiong, Ying
JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 150
[38] Effectiveness of multi-task deep learning framework for EEG-based emotion and context recognition
Choo, Sanghyun
Park, Hoonseok
Kim, Sangyeon
Park, Donghyun
Jung, Jae-Yoon
Lee, Sangwon
Nam, Chang S.
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 227
[39] A deep neural network based multi-task learning approach to hate speech detection
Kapil, Prashant
Ekbal, Asif
KNOWLEDGE-BASED SYSTEMS, 2020, 210 (210)
[40] Multi-task SonoEyeNet: Detection of Fetal Standardized Planes Assisted by Generated Sonographer Attention Maps
Cai, Yifan
Sharma, Harshita
Chatelain, Pierre
Noble, J. Alison
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2018, PT I, 2018, 11070 : 871 - 879

← 1 2 3 4 5 →