Selective cortical representation of attended speaker in multi-talker speech perception

被引:602
|
作者
Mesgarani, Nima [1 ,2 ]
Chang, Edward F. [1 ,2 ]
机构
[1] Univ Calif San Francisco, UCSF Ctr Integrat Neurosci, Dept Neurol Surg, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, UCSF Ctr Integrat Neurosci, Dept Physiol, San Francisco, CA 94143 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1038/nature11020
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker background(1-3). How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented(4,5). Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener's intended goal.
引用
收藏
页码:233 / U118
页数:5
相关论文
共 50 条
  • [31] FACE LANDMARK-BASED SPEAKER-INDEPENDENT AUDIO-VISUAL SPEECH ENHANCEMENT IN MULTI-TALKER ENVIRONMENTS
    Morrone, Giovanni
    Pasa, Luca
    Tikhanoff, Vadim
    Bergamaschi, Sonia
    Fadiga, Luciano
    Badino, Leonardo
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6900 - 6904
  • [32] The effect of nearby maskers on speech intelligibility in reverberant, multi-talker environments
    Westermann, Adam
    Buchholz, Joerg M.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (03): : 2214 - 2223
  • [33] Hierarchical Variational Loopy Belief Propagation for Multi-talker Speech Recognition
    Rennie, Steven J.
    Hershey, John R.
    Olsen, Peder A.
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 176 - 181
  • [34] Effects of face masks on speech recognition in multi-talker babble noise
    Toscano, Joseph C.
    Toscano, Cheyenne M.
    PLOS ONE, 2021, 16 (02):
  • [35] Learning Contextual Language Embeddings for Monaural Multi-talker Speech Recognition
    Zhang, Wangyou
    Qian, Yanmin
    INTERSPEECH 2020, 2020, : 304 - 308
  • [36] USING BINARUAL PROCESSING FOR AUTOMATIC SPEECH RECOGNITION IN MULTI-TALKER SCENES
    Spille, Constantin
    Dietz, Mathias
    Hohmann, Volker
    Meyer, Bernd T.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7805 - 7809
  • [37] Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming
    Yin, Lu
    Wang, Ziteng
    Xia, Risheng
    Li, Junfeng
    Yan, Yonghong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 851 - 855
  • [38] Audio-Visual Multi-Talker Speech Recognition in A Cocktail Party
    Wu, Yifei
    Hi, Chenda
    Yang, Song
    Wu, Zhongqin
    Qian, Yanmin
    INTERSPEECH 2021, 2021, : 3021 - 3025
  • [39] Chinese speech identification in multi-talker babble with diotic and dichotic listening
    PENG JianXin 1
    2 Department of Architecture
    Science Bulletin, 2012, 57 (20) : 2561 - 2566
  • [40] Chinese speech identification in multi-talker babble with diotic and dichotic listening
    Peng JianXin
    Zhang HongHu
    Wang ZiYou
    CHINESE SCIENCE BULLETIN, 2012, 57 (20): : 2548 - 2553