Selective cortical representation of attended speaker in multi-talker speech perception

被引:602
|
作者
Mesgarani, Nima [1 ,2 ]
Chang, Edward F. [1 ,2 ]
机构
[1] Univ Calif San Francisco, UCSF Ctr Integrat Neurosci, Dept Neurol Surg, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, UCSF Ctr Integrat Neurosci, Dept Physiol, San Francisco, CA 94143 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1038/nature11020
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker background(1-3). How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented(4,5). Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener's intended goal.
引用
收藏
页码:233 / U118
页数:5
相关论文
共 50 条
  • [1] Selective cortical representation of attended speaker in multi-talker speech perception
    Nima Mesgarani
    Edward F. Chang
    [J]. Nature, 2012, 485 : 233 - 236
  • [2] Hierarchical Encoding of Attended Auditory Objects in Multi-talker Speech Perception
    O'Sullivan, James
    Herrero, Jose
    Smith, Elliot
    Schevon, Catherine
    McKhann, Guy M.
    Sheth, Sameer A.
    Mehta, Ashesh D.
    Mesgarani, Nima
    [J]. NEURON, 2019, 104 (06) : 1195 - +
  • [3] Target Speaker Verification With Selective Auditory Attention for Single and Multi-Talker Speech
    Xu, Chenglin
    Rao, Wei
    Wu, Jibin
    Li, Haizhou
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2696 - 2709
  • [4] Streaming Multi-talker Speech Recognition with Joint Speaker Identification
    Lu, Liang
    Kanda, Naoyuki
    Li, Jinyu
    Gong, Yifan
    [J]. INTERSPEECH 2021, 2021, : 1782 - 1786
  • [5] Multi-Channel Speaker Verification for Single and Multi-talker Speech
    Kataria, Saurabh
    Zhang, Shi-Xiong
    Yu, Dong
    [J]. INTERSPEECH 2021, 2021, : 4608 - 4612
  • [6] Speaker Identification in Multi-Talker Overlapping Speech Using Neural Networks
    Tran, Van-Thuan
    Tsai, Wei-Ho
    [J]. IEEE ACCESS, 2020, 8 : 134868 - 134879
  • [7] Target Speaker Extraction for Multi-Talker Speaker Verification
    Rao, Wei
    Xu, Chenglin
    Chng, Eng Siong
    Li, Haizhou
    [J]. INTERSPEECH 2019, 2019, : 1273 - 1277
  • [8] Selective spatial attention in lateralized multi-talker speech perception: EEG correlates and the role of age
    Getzmann, Stephan
    Schneider, Daniel
    Wascher, Edmund
    [J]. NEUROBIOLOGY OF AGING, 2023, 126 : 1 - 13
  • [9] Auditory spatial cuing for speech perception in a dynamic multi-talker environment
    Tomoriova, Beata
    Kopco, Norbert
    [J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS, 2008, : 230 - 233
  • [10] Which Ones Are Speaking? Speaker-inferred Model for Multi-talker Speech Separation
    Shi, Jing
    Xu, Jiaming
    Xu, Bo
    [J]. INTERSPEECH 2019, 2019, : 4609 - 4613