Speech recognition in adverse conditions by humans and machines

被引：0

作者：

Patman, Chloe ^{[1
]}

Chodroff, Eleanor ^{[2
]}

机构：

[1] Univ Cambridge, Fac Modern & Medieval Languages & Linguist, Theoret & Appl Linguist Sect, Sidgwick Ave, Cambridge CB3 9DA, England

[2] Univ Zurich, Dept Computat Linguist, Andreasstr 15, CH-8050 Zurich, Switzerland

来源：

JASA EXPRESS LETTERS | 2024年 / 4卷 / 11期

关键词：

NOISE; INTELLIGIBILITY; ENGLISH;

D O I：

10.1121/10.0032473

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In the development of automatic speech recognition systems, achieving human-like performance has been a long-held goal. Recent releases of large spoken language models have claimed to achieve such performance, although direct comparison to humans has been severely limited. The present study tested L1 British English listeners against two automatic speech recognition systems (wav2vec 2.0 and Whisper, base and large sizes) in adverse listening conditions: speech-shaped noise and pub noise, at different signal-to-noise ratios, and recordings produced with or without face masks. Humans maintained the advantage against all systems, except for Whisper large, which outperformed humans in every condition but pub noise.

引用

页数：7

共 50 条

[41] Modelling Speech Intelligibility in Adverse Conditions
Jorgensen, Soren
Dau, Torsten
BASIC ASPECTS OF HEARING: PHYSIOLOGY AND PERCEPTION, 2013, 787 : 343 - 351
[42] PATTERN-RECOGNITION BY HUMANS AND MACHINES, VOL 1, SPEECH-PERCEPTION - SCHWAB,EC, NUSBAUM,HC
SEGUI, J
ANNEE PSYCHOLOGIQUE, 1988, 88 (02): : 294 - 295
[43] PATTERN-RECOGNITION BY HUMANS AND MACHINES, VOL 1, SPEECH-PERCEPTION - SCHWAB,EC, NUSBAUM,HC
WATERWORTH, JA
CURRENT PSYCHOLOGY-RESEARCH & REVIEWS, 1988, 7 (03): : 272 - 273
[44] PATTERN-RECOGNITION BY HUMANS AND MACHINES, VOL 1, SPEECH-PERCEPTION - SCHWAB,EC, NUSBAUM,HC
JUSCZYK, PW
CONTEMPORARY PSYCHOLOGY, 1988, 33 (04): : 321 - 322
[45] Medical Speech Recognition: Reaching Parity with Humans
Edwards, Erik
Salloum, Wael
Finley, Greg P.
Fone, James
Cardiff, Greg
Miller, Mark
Suendermann-Oeft, David
SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 512 - 524
[46] A robust training algorithm for adverse speech recognition
Hong, WT
Chen, SH
SPEECH COMMUNICATION, 2000, 30 (04) : 273 - 293
[47] Tracking Without Re-recognition in Humans and Machines
Linsley, Drew
Malik, Girik
Kim, Junkyung
Govindarajan, Lakshmi N.
Mingolla, Ennio
Serre, Thomas
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[48] Perception and classification of emotions in nonsense speech: Humans versus machines
Parada-Cabaleiro, Emilia
Batliner, Anton
Schmitt, Maximilian
Schedl, Markus
Costantini, Giovanni
Schuller, Bjoern
PLOS ONE, 2023, 18 (01):
[49] Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech
Mengistu, Kinfe Tadesse
Rudzicz, Frank
ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 291 - 300
[50] Automatic speech recognition lets machines listen and comprehend
Kempainen, S
EDN, 1997, 42 (05) : 73 - 80

← 1 2 3 4 5 →