A multi-task learning and auditory attention detection framework towards EEG-assisted target speech extraction

被引:0
|
作者
Wang, Xuefei [1 ]
Ding, Yuting [1 ]
Wang, Lei [2 ]
Chen, Fei [1 ,3 ]
机构
[1] Southern Univ Sci & Technol, Dept Elect & Elect Engn, Shenzhen, Peoples R China
[2] GuangZhou Univ, Sch Elect & Commun Engn, Guangzhou, Peoples R China
[3] Minist Nat Resources, Key Lab Urban Land Resources Monitoring & Simulat, Shenzhen, Peoples R China
关键词
Target speech extraction; EEG auditory attention detection; Multi-task learning; Symmetrically combined cross-attention; SPEAKER EXTRACTION; NEURAL-NETWORKS; SEPARATION;
D O I
10.1016/j.apacoust.2024.110474
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In complex multi-speaker environments, effectively separating the target speaker's voice remains a significant challenge. Traditional methods often rely on pre-recorded clean speech as a reference, which performs well in controlled settings but has limitations in more diverse conditions. Recently, the use of deep learning techniques has advanced speech separation technology, but limitations remain, especially when dealing with multiple speakers and complex background noise. In this study, we propose a novel EEG-assisted target speech extraction network based on multi-task learning, named MLAD-ETSEN. The approach combines EEG auditory attention detection with speech separation techniques. This method is based on the classical Conv-TasNet backbone network and utilizes joint training to extract speech and EEG embeddings from speech mixtures and EEG trials, respectively. To take full advantage of these embeddings, a symmetrically combined cross-attention module is designed to deeply integrate these features, which are then fed into a decoder to reconstruct the final speech waveform. The experimental results show that the model using the symmetrically combined cross-attention module for information fusion, under multi-task training with EEG auditory attention detection significantly outperforms existing baseline methods. Compared to the baseline model, the proposed MLAD-ETSEN model achieved a relative improvement of 15.70% in SI_SDR across all subjects. The STOI and PESQ scores improved by an average of 0.08 and 0.55, respectively, compared to the assessment before separation. The proposed EEG assisted target speech extraction method exhibits its potential in accurately identifying and separating the target speech in multi-speaker environments.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] A Multi-Task Learning Framework for Multi-Target Stance Detection
    Li, Yingjie
    Caragea, Cornelia
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2320 - 2326
  • [2] EEG-based Short-time Auditory Attention Detection using Multi-task Deep Learning
    Zhang, Zhuo
    Zhang, Gaoyan
    Dang, Jianwu
    Wu, Shuang
    Zhou, Di
    Wang, Longbiao
    INTERSPEECH 2020, 2020, : 2517 - 2521
  • [3] Towards Analyzing the Efficacy of Multi-task Learning in Hate Speech Detection
    Maity, Krishanu
    Balaji, Gokulapriyan
    Saha, Sriparna
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT VI, 2024, 14452 : 317 - 328
  • [4] Towards multi-task learning of speech and speaker recognition
    Vaessen, Nik
    van Leeuwen, David A.
    INTERSPEECH 2023, 2023, : 4898 - 4902
  • [5] A Multi-task Learning Framework for Opinion Triplet Extraction
    Zhang, Chen
    Li, Qiuchi
    Song, Dawei
    Wang, Benyou
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 819 - 828
  • [6] MULTI-TASK LEARNING IMPROVES SYNTHETIC SPEECH DETECTION
    Mo, Yichuan
    Wang, Shilin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6392 - 6396
  • [7] EEG study of attention on an auditory target detection task in dolphins and humans
    Schalles, Matt
    Pei, Alexander
    Noyce, Abigail
    Mulsow, Jason
    Houser, Dorian
    Finneran, James J.
    Tyack, Peter
    Shinn-Cunningham, Barbara
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [8] A Multi-Task Framework for Infrared Small Target Detection and Segmentation
    Chen, Yuhang
    Li, Liyuan
    Liu, Xin
    Su, Xiaofeng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [9] Spanish MTLHateCorpus 2023: Multi-task learning for hate speech detection to identify speech type, target, target group and intensity
    Pan, Ronghao
    Garcia-Diaz, Jose Antonio
    Valencia-Garcia, Rafael
    COMPUTER STANDARDS & INTERFACES, 2025, 94
  • [10] Task Aware Feature Extraction Framework for Sequential Dependence Multi-Task Learning
    Tao, Xuewen
    Ha, Mingming
    Guo, Xiaobo
    Ma, Qiongxu
    Cheng, Hongwei
    Lin, Wenfang
    Cheng, Linxun
    Han, Bing
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 151 - 160