Selective cortical representation of attended speaker in multi-talker speech perception

被引:602
|
作者
Mesgarani, Nima [1 ,2 ]
Chang, Edward F. [1 ,2 ]
机构
[1] Univ Calif San Francisco, UCSF Ctr Integrat Neurosci, Dept Neurol Surg, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, UCSF Ctr Integrat Neurosci, Dept Physiol, San Francisco, CA 94143 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1038/nature11020
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker background(1-3). How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented(4,5). Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener's intended goal.
引用
收藏
页码:233 / U118
页数:5
相关论文
共 50 条
  • [21] Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation
    Han, Cong
    Choudhari, Vishal
    Li, Yinghao Aaron
    Mesgarani, Nima
    2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [22] A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition
    Tu, Yan-Hui
    Du, Jun
    Dai, Li-Rung
    Lee, Chin-Hui
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [23] Deep neural networks based binary classification for single channel speaker independent multi-talker speech separation
    Saleem, Nasir
    Khattak, Muhammad Irfan
    APPLIED ACOUSTICS, 2020, 167
  • [24] Streaming End-to-End Multi-Talker Speech Recognition
    Lu, Liang
    Kanda, Naoyuki
    Li, Jinyu
    Gong, Yifan
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 803 - 807
  • [25] END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION
    Tripathi, Anshuman
    Lu, Han
    Sak, Hasim
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6129 - 6133
  • [26] Variational Loopy Belief Propagation for Multi-talker Speech Recognition
    Rennie, Steven J.
    Hershey, John R.
    Olsen, Peder A.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1367 - 1370
  • [27] Cortical tracking of lexical speech units in a multi-talker background is immature in school-aged children
    Niesen, Maxime
    Bourguignon, Mathieu
    Bertels, Julie
    Vander Ghinst, Marc
    Wens, Vincent
    Goldman, Serge
    De Tiege, Xavier
    NEUROIMAGE, 2023, 265
  • [28] The Impact of Speech-Irrelevant Head Movements on Speech Intelligibility in Multi-Talker Environments
    Frissen, Ilja
    Scherzer, Johannes
    Yao, Hsin-Yun
    ACTA ACUSTICA UNITED WITH ACUSTICA, 2019, 105 (06) : 1286 - 1290
  • [29] SPEAKER CHANGE DETECTION USING FUNDAMENTAL FREQUENCY WITH APPLICATION TO MULTI-TALKER SEGMENTATION
    Hogg, Aidan O. T.
    Evers, Christine
    Naylor, Patrick A.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5826 - 5830
  • [30] EEG correlates of spatial shifts of attention in a dynamic multi-talker speech perception scenario in younger and older adults
    Getzmann, Stephan
    Klatt, Laura-Isabelle
    Schneider, Daniel
    Begau, Alexandra
    Wascher, Edmund
    HEARING RESEARCH, 2020, 398