Speech recognition in adverse conditions by humans and machines

被引:0
|
作者
Patman, Chloe [1 ]
Chodroff, Eleanor [2 ]
机构
[1] Univ Cambridge, Fac Modern & Medieval Languages & Linguist, Theoret & Appl Linguist Sect, Sidgwick Ave, Cambridge CB3 9DA, England
[2] Univ Zurich, Dept Computat Linguist, Andreasstr 15, CH-8050 Zurich, Switzerland
来源
JASA EXPRESS LETTERS | 2024年 / 4卷 / 11期
关键词
NOISE; INTELLIGIBILITY; ENGLISH;
D O I
10.1121/10.0032473
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the development of automatic speech recognition systems, achieving human-like performance has been a long-held goal. Recent releases of large spoken language models have claimed to achieve such performance, although direct comparison to humans has been severely limited. The present study tested L1 British English listeners against two automatic speech recognition systems (wav2vec 2.0 and Whisper, base and large sizes) in adverse listening conditions: speech-shaped noise and pub noise, at different signal-to-noise ratios, and recordings produced with or without face masks. Humans maintained the advantage against all systems, except for Whisper large, which outperformed humans in every condition but pub noise.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Modelling Speech Intelligibility in Adverse Conditions
    Jorgensen, Soren
    Dau, Torsten
    BASIC ASPECTS OF HEARING: PHYSIOLOGY AND PERCEPTION, 2013, 787 : 343 - 351
  • [42] PATTERN-RECOGNITION BY HUMANS AND MACHINES, VOL 1, SPEECH-PERCEPTION - SCHWAB,EC, NUSBAUM,HC
    SEGUI, J
    ANNEE PSYCHOLOGIQUE, 1988, 88 (02): : 294 - 295
  • [43] PATTERN-RECOGNITION BY HUMANS AND MACHINES, VOL 1, SPEECH-PERCEPTION - SCHWAB,EC, NUSBAUM,HC
    WATERWORTH, JA
    CURRENT PSYCHOLOGY-RESEARCH & REVIEWS, 1988, 7 (03): : 272 - 273
  • [44] PATTERN-RECOGNITION BY HUMANS AND MACHINES, VOL 1, SPEECH-PERCEPTION - SCHWAB,EC, NUSBAUM,HC
    JUSCZYK, PW
    CONTEMPORARY PSYCHOLOGY, 1988, 33 (04): : 321 - 322
  • [45] Medical Speech Recognition: Reaching Parity with Humans
    Edwards, Erik
    Salloum, Wael
    Finley, Greg P.
    Fone, James
    Cardiff, Greg
    Miller, Mark
    Suendermann-Oeft, David
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 512 - 524
  • [46] A robust training algorithm for adverse speech recognition
    Hong, WT
    Chen, SH
    SPEECH COMMUNICATION, 2000, 30 (04) : 273 - 293
  • [47] Tracking Without Re-recognition in Humans and Machines
    Linsley, Drew
    Malik, Girik
    Kim, Junkyung
    Govindarajan, Lakshmi N.
    Mingolla, Ennio
    Serre, Thomas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [48] Perception and classification of emotions in nonsense speech: Humans versus machines
    Parada-Cabaleiro, Emilia
    Batliner, Anton
    Schmitt, Maximilian
    Schedl, Markus
    Costantini, Giovanni
    Schuller, Bjoern
    PLOS ONE, 2023, 18 (01):
  • [49] Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech
    Mengistu, Kinfe Tadesse
    Rudzicz, Frank
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 291 - 300
  • [50] Automatic speech recognition lets machines listen and comprehend
    Kempainen, S
    EDN, 1997, 42 (05) : 73 - 80