A multi-task learning and auditory attention detection framework towards EEG-assisted target speech extraction

被引：0

作者：

Wang, Xuefei ^{[1
]}

Ding, Yuting ^{[1
]}

Wang, Lei ^{[2
]}

Chen, Fei ^{[1
,3
]}

机构：

[1] Southern Univ Sci & Technol, Dept Elect & Elect Engn, Shenzhen, Peoples R China

[2] GuangZhou Univ, Sch Elect & Commun Engn, Guangzhou, Peoples R China

[3] Minist Nat Resources, Key Lab Urban Land Resources Monitoring & Simulat, Shenzhen, Peoples R China

来源：

APPLIED ACOUSTICS | 2025年 / 231卷

关键词：

Target speech extraction; EEG auditory attention detection; Multi-task learning; Symmetrically combined cross-attention; SPEAKER EXTRACTION; NEURAL-NETWORKS; SEPARATION;

D O I：

10.1016/j.apacoust.2024.110474

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In complex multi-speaker environments, effectively separating the target speaker's voice remains a significant challenge. Traditional methods often rely on pre-recorded clean speech as a reference, which performs well in controlled settings but has limitations in more diverse conditions. Recently, the use of deep learning techniques has advanced speech separation technology, but limitations remain, especially when dealing with multiple speakers and complex background noise. In this study, we propose a novel EEG-assisted target speech extraction network based on multi-task learning, named MLAD-ETSEN. The approach combines EEG auditory attention detection with speech separation techniques. This method is based on the classical Conv-TasNet backbone network and utilizes joint training to extract speech and EEG embeddings from speech mixtures and EEG trials, respectively. To take full advantage of these embeddings, a symmetrically combined cross-attention module is designed to deeply integrate these features, which are then fed into a decoder to reconstruct the final speech waveform. The experimental results show that the model using the symmetrically combined cross-attention module for information fusion, under multi-task training with EEG auditory attention detection significantly outperforms existing baseline methods. Compared to the baseline model, the proposed MLAD-ETSEN model achieved a relative improvement of 15.70% in SI_SDR across all subjects. The STOI and PESQ scores improved by an average of 0.08 and 0.55, respectively, compared to the assessment before separation. The proposed EEG assisted target speech extraction method exhibits its potential in accurately identifying and separating the target speech in multi-speaker environments.

引用

页数：10

共 50 条

[1] A Multi-Task Learning Framework for Multi-Target Stance Detection
Li, Yingjie
Caragea, Cornelia
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2320 - 2326
[2] EEG-based Short-time Auditory Attention Detection using Multi-task Deep Learning
Zhang, Zhuo
Zhang, Gaoyan
Dang, Jianwu
Wu, Shuang
Zhou, Di
Wang, Longbiao
INTERSPEECH 2020, 2020, : 2517 - 2521
[3] Towards Analyzing the Efficacy of Multi-task Learning in Hate Speech Detection
Maity, Krishanu
Balaji, Gokulapriyan
Saha, Sriparna
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT VI, 2024, 14452 : 317 - 328
[4] Towards multi-task learning of speech and speaker recognition
Vaessen, Nik
van Leeuwen, David A.
INTERSPEECH 2023, 2023, : 4898 - 4902
[5] A Multi-task Learning Framework for Opinion Triplet Extraction
Zhang, Chen
Li, Qiuchi
Song, Dawei
Wang, Benyou
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 819 - 828
[6] MULTI-TASK LEARNING IMPROVES SYNTHETIC SPEECH DETECTION
Mo, Yichuan
Wang, Shilin
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6392 - 6396
[7] EEG study of attention on an auditory target detection task in dolphins and humans
Schalles, Matt
Pei, Alexander
Noyce, Abigail
Mulsow, Jason
Houser, Dorian
Finneran, James J.
Tyack, Peter
Shinn-Cunningham, Barbara
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
[8] A Multi-Task Framework for Infrared Small Target Detection and Segmentation
Chen, Yuhang
Li, Liyuan
Liu, Xin
Su, Xiaofeng
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[9] Spanish MTLHateCorpus 2023: Multi-task learning for hate speech detection to identify speech type, target, target group and intensity
Pan, Ronghao
Garcia-Diaz, Jose Antonio
Valencia-Garcia, Rafael
COMPUTER STANDARDS & INTERFACES, 2025, 94
[10] Task Aware Feature Extraction Framework for Sequential Dependence Multi-Task Learning
Tao, Xuewen
Ha, Mingming
Guo, Xiaobo
Ma, Qiongxu
Cheng, Hongwei
Lin, Wenfang
Cheng, Linxun
Han, Bing
PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 151 - 160

← 1 2 3 4 5 →