A multi-task learning and auditory attention detection framework towards EEG-assisted target speech extraction

被引:0
|
作者
Wang, Xuefei [1 ]
Ding, Yuting [1 ]
Wang, Lei [2 ]
Chen, Fei [1 ,3 ]
机构
[1] Southern Univ Sci & Technol, Dept Elect & Elect Engn, Shenzhen, Peoples R China
[2] GuangZhou Univ, Sch Elect & Commun Engn, Guangzhou, Peoples R China
[3] Minist Nat Resources, Key Lab Urban Land Resources Monitoring & Simulat, Shenzhen, Peoples R China
关键词
Target speech extraction; EEG auditory attention detection; Multi-task learning; Symmetrically combined cross-attention; SPEAKER EXTRACTION; NEURAL-NETWORKS; SEPARATION;
D O I
10.1016/j.apacoust.2024.110474
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In complex multi-speaker environments, effectively separating the target speaker's voice remains a significant challenge. Traditional methods often rely on pre-recorded clean speech as a reference, which performs well in controlled settings but has limitations in more diverse conditions. Recently, the use of deep learning techniques has advanced speech separation technology, but limitations remain, especially when dealing with multiple speakers and complex background noise. In this study, we propose a novel EEG-assisted target speech extraction network based on multi-task learning, named MLAD-ETSEN. The approach combines EEG auditory attention detection with speech separation techniques. This method is based on the classical Conv-TasNet backbone network and utilizes joint training to extract speech and EEG embeddings from speech mixtures and EEG trials, respectively. To take full advantage of these embeddings, a symmetrically combined cross-attention module is designed to deeply integrate these features, which are then fed into a decoder to reconstruct the final speech waveform. The experimental results show that the model using the symmetrically combined cross-attention module for information fusion, under multi-task training with EEG auditory attention detection significantly outperforms existing baseline methods. Compared to the baseline model, the proposed MLAD-ETSEN model achieved a relative improvement of 15.70% in SI_SDR across all subjects. The STOI and PESQ scores improved by an average of 0.08 and 0.55, respectively, compared to the assessment before separation. The proposed EEG assisted target speech extraction method exhibits its potential in accurately identifying and separating the target speech in multi-speaker environments.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Towards Speech Emotion Recognition "in the wild" using Aggregated Corpora and Deep Multi-Task Learning
    Kim, Jaebok
    Englebienne, Gwenn
    Truong, Khiet P.
    Evers, Vanessa
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1113 - 1117
  • [42] A physical exertion inspired multi-task learning framework for detecting out-of-breath speech
    Sahoo, Sibasis
    Dandapat, Samarendra
    COMPUTER SPEECH AND LANGUAGE, 2023, 84
  • [43] Hear No Evil: Towards Adversarial Robustness of Automatic Speech Recognition via Multi-Task Learning
    Das, Nilaksh
    Chau, Duen Horng
    INTERSPEECH 2022, 2022, : 3839 - 3843
  • [44] ATTENTION-AUGMENTED END-TO-END MULTI-TASK LEARNING FOR EMOTION PREDICTION FROM SPEECH
    Zhang, Zixing
    Wu, Bingwen
    Schuller, Bjoern
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6705 - 6709
  • [45] A Robust Feature Extraction Method for Underwater Acoustic Target Recognition Based on Multi-Task Learning
    Li, Daihui
    Liu, Feng
    Shen, Tongsheng
    Chen, Liang
    Zhao, Dexin
    ELECTRONICS, 2023, 12 (07)
  • [46] Joint aspect terms extraction and aspect categories detection via multi-task learning
    Wei, Youcai
    Zhang, Hongyun
    Fang, Jian
    Wen, Jiahui
    Ma, Jingwei
    Zhang, Guangda
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 174
  • [47] Multi-Task Learning for Building Extraction and Change Detection from Remote Sensing Images
    Hong, Danyang
    Qiu, Chunping
    Yu, Anzhu
    Quan, Yujun
    Liu, Bing
    Chen, Xin
    APPLIED SCIENCES-BASEL, 2023, 13 (02):
  • [48] A Multi-Task Based Deep Learning Framework With Landmark Detection for MRI Couinaud Segmentation
    Miao, Dong
    Zhao, Ying
    Ren, Xue
    Dou, Meng
    Yao, Yu
    Xu, Yiran
    Cui, Yingchao
    Liu, Ailian
    IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE, 2024, 12 : 697 - 710
  • [49] Reinforcement Guided Multi-Task Learning Framework for Low-Resource Stereotype Detection
    Pujari, Rajkumar
    Oveson, Erik
    Kulkarni, Priyanka
    Nouri, Elnaz
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6703 - 6712
  • [50] LEMT: A Label Enhanced Multi-task Learning Framework for Malevolent Dialogue Response Detection
    Wang, Kaiyue
    Yang, Fan
    Yao, Yucheng
    Zhou, Xiabing
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I, PAKDD 2024, 2024, 14645 : 130 - 142