Selective cortical representation of attended speaker in multi-talker speech perception

被引:602
|
作者
Mesgarani, Nima [1 ,2 ]
Chang, Edward F. [1 ,2 ]
机构
[1] Univ Calif San Francisco, UCSF Ctr Integrat Neurosci, Dept Neurol Surg, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, UCSF Ctr Integrat Neurosci, Dept Physiol, San Francisco, CA 94143 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1038/nature11020
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker background(1-3). How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented(4,5). Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener's intended goal.
引用
收藏
页码:233 / U118
页数:5
相关论文
共 50 条
  • [41] EFFECTS OF MULTI-TALKER COMPETING SPEECH ON THE VARIABILITY OF THE CALIFORNIA CONSONANT TEST
    SURR, RK
    SCHWARTZ, DM
    EAR AND HEARING, 1980, 1 (06): : 319 - 323
  • [42] Speech-derived haptic stimulation enhances speech recognition in a multi-talker background
    I. Sabina Răutu
    Xavier De Tiège
    Veikko Jousmäki
    Mathieu Bourguignon
    Julie Bertels
    Scientific Reports, 13
  • [43] Speech-derived haptic stimulation enhances speech recognition in a multi-talker background
    Rautu, I. Sabina
    De Tiege, Xavier
    Jousmaki, Veikko
    Bourguignon, Mathieu
    Bertels, Julie
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [44] Interaction of bottom-up and top-down neural mechanisms in spatial multi-talker speech perception
    Patel, Prachi
    van der Heijden, Kiki
    Bickel, Stephan
    Herrero, Jose L.
    Mehta, Ashesh D.
    Mesgarani, Nima
    CURRENT BIOLOGY, 2022, 32 (18) : 3971 - +
  • [45] MULTI-MICROPHONE NEURAL SPEECH SEPARATION FOR FAR-FIELD MULTI-TALKER SPEECH RECOGNITION
    Yoshioka, Takuya
    Erdogan, Hakan
    Chen, Zhuo
    Alleva, Fil
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5739 - 5743
  • [46] Single-channel multi-talker speech recognition with permutation invariant training
    Qian, Yanmin
    Chang, Xuankai
    Yu, Dong
    SPEECH COMMUNICATION, 2018, 104 : 1 - 11
  • [47] Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition
    Weng, Chao
    Yu, Dong
    Seltzer, Michael L.
    Droppo, Jasha
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (10) : 1670 - 1679
  • [48] The effects of speech processing units on auditory stream segregation and selective attention in a multi-talker (cocktail party) situation
    Toth, Brigitta
    Honbolygo, Ferenc
    Szalardy, Orsolya
    Orosz, Gabor
    Farkas, David
    Winkler, Istvan
    CORTEX, 2020, 130 : 387 - 400
  • [49] A microphone array beamforming-based system for multi-talker speech separation
    Hidri, Adel
    Amiri, Hamid
    INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2016, 9 (4-5) : 209 - 217
  • [50] Super-human multi-talker speech recognition: A graphical modeling approach
    Hershey, John R.
    Rennie, Steven J.
    Olsen, Peder A.
    Kristjansson, Trausti T.
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 45 - 66