The cocktail-party problem revisited: early processing and selection of multi-talker speech

被引:265
|
作者
Bronkhorst, Adelbert W. [1 ,2 ]
机构
[1] TNO Human Factors, NL-3769 ZG Soesterberg, Netherlands
[2] Vrije Univ Amsterdam, Dept Cognit Psychol, NL-1081 BT Amsterdam, Netherlands
关键词
Attention; Auditory scene analysis; Cocktail-party problem; Informational masking; Speech perception; HUMAN AUDITORY-CORTEX; INTERAURAL TIME DIFFERENCES; RECEPTION THRESHOLD; FUNDAMENTAL-FREQUENCY; ENERGETIC MASKING; INFORMATIONAL MASKING; PERCEPTUAL SEPARATION; INTELLIGIBILITY INDEX; MISMATCH NEGATIVITY; ATTENTIONAL CAPTURE;
D O I
10.3758/s13414-015-0882-9
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
How do we recognize what one person is saying when others are speaking at the same time? This review summarizes widespread research in psychoacoustics, auditory scene analysis, and attention, all dealing with early processing and selection of speech, which has been stimulated by this question. Important effects occurring at the peripheral and brainstem levels are mutual masking of sounds and "unmasking" resulting from binaural listening. Psychoacoustic models have been developed that can predict these effects accurately, albeit using computational approaches rather than approximations of neural processing. Grouping-the segregation and streaming of sounds-represents a subsequent processing stage that interacts closely with attention. Sounds can be easily grouped-and subsequently selected-using primitive features such as spatial location and fundamental frequency. More complex processing is required when lexical, syntactic, or semantic information is used. Whereas it is now clear that such processing can take place preattentively, there also is evidence that the processing depth depends on the task-relevancy of the sound. This is consistent with the presence of a feedback loop in attentional control, triggering enhancement of to-be-selected input. Despite recent progress, there are still many unresolved issues: there is a need for integrative models that are neurophysiologically plausible, for research into grouping based on other than spatial or voice-related cues, for studies explicitly addressing endogenous and exogenous attention, for an explanation of the remarkable sluggishness of attention focused on dynamically changing sounds, and for research elucidating the distinction between binaural speech perception and sound localization.
引用
收藏
页码:1465 / 1487
页数:23
相关论文
共 50 条
  • [41] Speech-derived haptic stimulation enhances speech recognition in a multi-talker background
    Rautu, I. Sabina
    De Tiege, Xavier
    Jousmaki, Veikko
    Bourguignon, Mathieu
    Bertels, Julie
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [42] Speech-derived haptic stimulation enhances speech recognition in a multi-talker background
    I. Sabina Răutu
    Xavier De Tiège
    Veikko Jousmäki
    Mathieu Bourguignon
    Julie Bertels
    Scientific Reports, 13
  • [43] Single-channel multi-talker speech recognition with permutation invariant training
    Qian, Yanmin
    Chang, Xuankai
    Yu, Dong
    SPEECH COMMUNICATION, 2018, 104 : 1 - 11
  • [44] Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition
    Weng, Chao
    Yu, Dong
    Seltzer, Michael L.
    Droppo, Jasha
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (10) : 1670 - 1679
  • [45] MULTI-MICROPHONE NEURAL SPEECH SEPARATION FOR FAR-FIELD MULTI-TALKER SPEECH RECOGNITION
    Yoshioka, Takuya
    Erdogan, Hakan
    Chen, Zhuo
    Alleva, Fil
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5739 - 5743
  • [46] EVALUATION OF THE COCKTAIL-PARTY EFFECT FOR MULTIPLE SPEECH STIMULI WITHIN A SPATIAL AUDITORY DISPLAY
    CRISPIEN, K
    EHRENBERG, T
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 1995, 43 (11): : 932 - 941
  • [47] Left Superior Temporal Gyrus Is Coupled to Attended Speech in a Cocktail-Party Auditory Scene
    Ghinst, Marc Vander
    Bourguignon, Mathieu
    de Beeck, Marc Op
    Wens, Vincent
    Marty, Brice
    Hassid, Sergio
    Choufani, Georges
    Jousmaki, Veikko
    Hari, Riitta
    Van Bogaert, Patrick
    Goldman, Serge
    De Tiege, Xavier
    JOURNAL OF NEUROSCIENCE, 2016, 36 (05): : 1596 - 1606
  • [48] Neural indices of spoken word processing in background multi-talker babble
    Romei, Laurie
    Wambacq, Ilse J. A.
    Besing, Joan
    Koehnke, Janet
    Jerger, James
    INTERNATIONAL JOURNAL OF AUDIOLOGY, 2011, 50 (05) : 321 - 333
  • [49] The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene
    Rimmele, Johanna M.
    Golumbic, Elana Zion
    Schroeger, Erich
    Poeppel, David
    CORTEX, 2015, 68 : 144 - 154
  • [50] Super-human multi-talker speech recognition: A graphical modeling approach
    Hershey, John R.
    Rennie, Steven J.
    Olsen, Peder A.
    Kristjansson, Trausti T.
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 45 - 66