The cocktail-party problem revisited: early processing and selection of multi-talker speech

被引：265

作者：

Bronkhorst, Adelbert W. ^{[1
,2
]}

机构：

[1] TNO Human Factors, NL-3769 ZG Soesterberg, Netherlands

[2] Vrije Univ Amsterdam, Dept Cognit Psychol, NL-1081 BT Amsterdam, Netherlands

来源：

ATTENTION PERCEPTION & PSYCHOPHYSICS | 2015年 / 77卷 / 05期

关键词：

Attention; Auditory scene analysis; Cocktail-party problem; Informational masking; Speech perception; HUMAN AUDITORY-CORTEX; INTERAURAL TIME DIFFERENCES; RECEPTION THRESHOLD; FUNDAMENTAL-FREQUENCY; ENERGETIC MASKING; INFORMATIONAL MASKING; PERCEPTUAL SEPARATION; INTELLIGIBILITY INDEX; MISMATCH NEGATIVITY; ATTENTIONAL CAPTURE;

D O I：

10.3758/s13414-015-0882-9

中图分类号：

B84 [心理学];

学科分类号：

04 ; 0402 ;

摘要：

How do we recognize what one person is saying when others are speaking at the same time? This review summarizes widespread research in psychoacoustics, auditory scene analysis, and attention, all dealing with early processing and selection of speech, which has been stimulated by this question. Important effects occurring at the peripheral and brainstem levels are mutual masking of sounds and "unmasking" resulting from binaural listening. Psychoacoustic models have been developed that can predict these effects accurately, albeit using computational approaches rather than approximations of neural processing. Grouping-the segregation and streaming of sounds-represents a subsequent processing stage that interacts closely with attention. Sounds can be easily grouped-and subsequently selected-using primitive features such as spatial location and fundamental frequency. More complex processing is required when lexical, syntactic, or semantic information is used. Whereas it is now clear that such processing can take place preattentively, there also is evidence that the processing depth depends on the task-relevancy of the sound. This is consistent with the presence of a feedback loop in attentional control, triggering enhancement of to-be-selected input. Despite recent progress, there are still many unresolved issues: there is a need for integrative models that are neurophysiologically plausible, for research into grouping based on other than spatial or voice-related cues, for studies explicitly addressing endogenous and exogenous attention, for an explanation of the remarkable sluggishness of attention focused on dynamically changing sounds, and for research elucidating the distinction between binaural speech perception and sound localization.

引用

页码：1465 / 1487

页数：23

共 50 条

[21] SEPARATION OF SEVERAL SPEAKERS RECORDED BY 2 MICROPHONES (COCKTAIL-PARTY PROCESSING)
STRUBE, HW
SIGNAL PROCESSING, 1981, 3 (04) : 355 - 364
[22] Variational Loopy Belief Propagation for Multi-talker Speech Recognition
Rennie, Steven J.
Hershey, John R.
Olsen, Peder A.
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1367 - 1370
[23] Streaming End-to-End Multi-Talker Speech Recognition
Lu, Liang
Kanda, Naoyuki
Li, Jinyu
Gong, Yifan
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 803 - 807
[24] Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation
Han, Cong
Choudhari, Vishal
Li, Yinghao Aaron
Mesgarani, Nima
2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
[25] Hierarchical Encoding of Attended Auditory Objects in Multi-talker Speech Perception
O'Sullivan, James
Herrero, Jose
Smith, Elliot
Schevon, Catherine
McKhann, Guy M.
Sheth, Sameer A.
Mehta, Ashesh D.
Mesgarani, Nima
NEURON, 2019, 104 (06) : 1195 - +
[26] The Impact of Speech-Irrelevant Head Movements on Speech Intelligibility in Multi-Talker Environments
Frissen, Ilja
Scherzer, Johannes
Yao, Hsin-Yun
ACTA ACUSTICA UNITED WITH ACUSTICA, 2019, 105 (06) : 1286 - 1290
[27] The integration of continuous audio and visual speech in a cocktail-party environment depends on attention
Ahmed, Farhin
Nidiffer, Aaron R.
O'Sullivan, Aisling E.
Zuk, Nathaniel J.
Lalor, Edmund C.
NEUROIMAGE, 2023, 274
[28] The effect of nearby maskers on speech intelligibility in reverberant, multi-talker environments
Westermann, Adam
Buchholz, Joerg M.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (03): : 2214 - 2223
[29] Learning Contextual Language Embeddings for Monaural Multi-talker Speech Recognition
Zhang, Wangyou
Qian, Yanmin
INTERSPEECH 2020, 2020, : 304 - 308
[30] Chinese speech identification in multi-talker babble with diotic and dichotic listening
PENG JianXin 1
2 Department of Architecture
ChineseScienceBulletin, 2012, 57 (20) : 2561 - 2566

← 1 2 3 4 5 →