Speech recognition in adverse conditions by humans and machines

被引:0
|
作者
Patman, Chloe [1 ]
Chodroff, Eleanor [2 ]
机构
[1] Univ Cambridge, Fac Modern & Medieval Languages & Linguist, Theoret & Appl Linguist Sect, Sidgwick Ave, Cambridge CB3 9DA, England
[2] Univ Zurich, Dept Computat Linguist, Andreasstr 15, CH-8050 Zurich, Switzerland
来源
JASA EXPRESS LETTERS | 2024年 / 4卷 / 11期
关键词
NOISE; INTELLIGIBILITY; ENGLISH;
D O I
10.1121/10.0032473
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the development of automatic speech recognition systems, achieving human-like performance has been a long-held goal. Recent releases of large spoken language models have claimed to achieve such performance, although direct comparison to humans has been severely limited. The present study tested L1 British English listeners against two automatic speech recognition systems (wav2vec 2.0 and Whisper, base and large sizes) in adverse listening conditions: speech-shaped noise and pub noise, at different signal-to-noise ratios, and recordings produced with or without face masks. Humans maintained the advantage against all systems, except for Whisper large, which outperformed humans in every condition but pub noise.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Speech recognition by machines and humans
    Lippmann, RP
    SPEECH COMMUNICATION, 1997, 22 (01) : 1 - 15
  • [2] Speech recognition by machines and humans
    Lincoln Lab MIT, Lexington, United States
    Speech Commun, 1 (1-15):
  • [3] Speech recognition by humans and machines under conditions with severe channel variability and noise
    Lippmann, RP
    Carlson, BA
    APPLICATIONS AND SCIENCE OF ARTIFICIAL NEURAL NETWORKS III, 1997, 3077 : 46 - 57
  • [4] Speech recognition in adverse conditions: A review
    Mattys, Sven L.
    Davis, Matthew H.
    Bradlow, Ann R.
    Scott, Sophie K.
    LANGUAGE AND COGNITIVE PROCESSES, 2012, 27 (7-8): : 953 - 978
  • [5] English Conversational Telephone Speech Recognition by Humans and Machines
    Saon, George
    Kurata, Gakuto
    Sercu, Tom
    Audhkhasi, Kartik
    Thomas, Samuel
    Dimitriadis, Dimitrios
    Cui, Xiaodong
    Ramabhadran, Bhuvana
    Picheny, Michael
    Lim, Lynn-Li
    Roomi, Bergul
    Hall, Phil
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 132 - 136
  • [6] ENGLISH BROADCAST NEWS SPEECH RECOGNITION BY HUMANS AND MACHINES
    Thomas, Samuel
    Suzuki, Masayuki
    Huang, Yinghui
    Kurata, Gakuto
    Tuske, Zoltan
    Saon, George
    Kingsbury, Brian
    Picheny, Michael
    Dibert, Tom
    Kaiser-Schatzlein, Alice
    Samko, Bern
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6455 - 6459
  • [7] Assessing costa rican children speech recognition by humans and machines
    Morales-Rodriguez, Maribel
    Coto-Jimenez, Marvin
    TECNOLOGIA EN MARCHA, 2022, 35
  • [8] SYNTHESIS AND RECOGNITION OF SPEECH - VOICE COMMUNICATION BETWEEN HUMANS AND MACHINES
    FLANAGAN, JL
    IEEE TRANSACTIONS ON SONICS AND ULTRASONICS, 1982, 29 (03): : 158 - 158
  • [9] Towards improving speech detection robustness for speech recognition in adverse conditions
    Karray, L
    Martin, A
    SPEECH COMMUNICATION, 2003, 40 (03) : 261 - 276
  • [10] Listening in the dips: Comparing relevant features for speech recognition in humans and machines
    Spille, Constantin
    Meyer, Bernd T.
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2968 - 2972