Relating EEG to continuous speech using deep neural networks: a review

被引:20
|
作者
Puffay, Corentin [1 ,2 ]
Accou, Bernd [1 ,2 ]
Bollens, Lies [1 ,2 ]
Monesi, Mohammad Jalilpour [1 ,2 ]
Vanthornhout, Jonas [1 ]
Van Hamme, Hugo [2 ]
Francart, Tom [1 ]
机构
[1] Katholieke Univ Leuven, Dept Neurosci, ExpORL, Leuven, Belgium
[2] Katholieke Univ Leuven, Dept Elect Engn ESAT, PSI, Leuven, Belgium
关键词
EEG; deep learning; speech; auditory neuroscience; ENTRAINMENT; FREQUENCY; BRAIN; LEVEL;
D O I
10.1088/1741-2552/ace73f
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Objective. When a person listens to continuous speech, a corresponding response is elicited in the brain and can be recorded using electroencephalography (EEG). Linear models are presently used to relate the EEG recording to the corresponding speech signal. The ability of linear models to find a mapping between these two signals is used as a measure of neural tracking of speech. Such models are limited as they assume linearity in the EEG-speech relationship, which omits the nonlinear dynamics of the brain. As an alternative, deep learning models have recently been used to relate EEG to continuous speech. Approach. This paper reviews and comments on deep-learning-based studies that relate EEG to continuous speech in single- or multiple-speakers paradigms. We point out recurrent methodological pitfalls and the need for a standard benchmark of model analysis. Main results. We gathered 29 studies. The main methodological issues we found are biased cross-validations, data leakage leading to over-fitted models, or disproportionate data size compared to the model's complexity. In addition, we address requirements for a standard benchmark model analysis, such as public datasets, common evaluation metrics, and good practices for the match-mismatch task. Significance. We present a review paper summarizing the main deep-learning-based studies that relate EEG to speech while addressing methodological pitfalls and important considerations for this newly expanding field. Our study is particularly relevant given the growing application of deep learning in EEG-speech decoding.
引用
收藏
页数:28
相关论文
共 50 条
  • [11] Emotional Speech Recognition Using Deep Neural Networks
    Trinh Van, Loan
    Dao Thi Le, Thuy
    Le Xuan, Thanh
    Castelli, Eric
    SENSORS, 2022, 22 (04)
  • [12] Identification of perceived sentences using deep neural networks in EEG
    Valle, Carlos
    Mendez-Orellana, Carolina
    Herff, Christian
    Rodriguez-Fernandez, Maria
    JOURNAL OF NEURAL ENGINEERING, 2024, 21 (05)
  • [13] Detection of phonological features in continuous speech using neural networks
    King, S
    Taylor, P
    COMPUTER SPEECH AND LANGUAGE, 2000, 14 (04): : 333 - 353
  • [14] Robust decoding of the speech envelope from EEG recordings through deep neural networks
    Thornton, Mike
    Mandic, Danilo
    Reichenbach, Tobias
    JOURNAL OF NEURAL ENGINEERING, 2022, 19 (04)
  • [15] Automatic Recognition of Kazakh Speech Using Deep Neural Networks
    Mamyrbayev, Orken
    Turdalyuly, Mussa
    Mekebayev, Nurbapa
    Alimhan, Keylan
    Kydyrbekova, Aizat
    Turdalykyzy, Tolganay
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT II, 2019, 11432 : 465 - 474
  • [16] Binaural Speech Intelligibility Estimation Using Deep Neural Networks
    Kondo, Kazuhiro
    Taira, Kazuya
    Kobayashi, Yosuke
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1858 - 1862
  • [17] Speech Activity Detection on YouTube Using Deep Neural Networks
    Ryant, Neville
    Liberman, Mark
    Yuan, Jiahong
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 728 - 731
  • [18] Enhanced speech emotion detection using deep neural networks
    S. Lalitha
    Shikha Tripathi
    Deepa Gupta
    International Journal of Speech Technology, 2019, 22 : 497 - 510
  • [19] STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING DEEP NEURAL NETWORKS
    Zen, Heiga
    Senior, Andrew
    Schuster, Mike
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7962 - 7966
  • [20] PERCEPTUALLY GUIDED SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS
    Zhao, Yan
    Xu, Buye
    Giri, Ritwik
    Zhang, Tao
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5074 - 5078