Speech recognition in adverse conditions by humans and machines

被引：0

作者：

Patman, Chloe ^{[1
]}

Chodroff, Eleanor ^{[2
]}

机构：

[1] Univ Cambridge, Fac Modern & Medieval Languages & Linguist, Theoret & Appl Linguist Sect, Sidgwick Ave, Cambridge CB3 9DA, England

[2] Univ Zurich, Dept Computat Linguist, Andreasstr 15, CH-8050 Zurich, Switzerland

来源：

JASA EXPRESS LETTERS | 2024年 / 4卷 / 11期

关键词：

NOISE; INTELLIGIBILITY; ENGLISH;

D O I：

10.1121/10.0032473

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In the development of automatic speech recognition systems, achieving human-like performance has been a long-held goal. Recent releases of large spoken language models have claimed to achieve such performance, although direct comparison to humans has been severely limited. The present study tested L1 British English listeners against two automatic speech recognition systems (wav2vec 2.0 and Whisper, base and large sizes) in adverse listening conditions: speech-shaped noise and pub noise, at different signal-to-noise ratios, and recordings produced with or without face masks. Humans maintained the advantage against all systems, except for Whisper large, which outperformed humans in every condition but pub noise.

引用

页数：7

共 50 条

[1] Speech recognition by machines and humans
Lippmann, RP
SPEECH COMMUNICATION, 1997, 22 (01) : 1 - 15
[2] Speech recognition by machines and humans
Lincoln Lab MIT, Lexington, United States
Speech Commun, 1 (1-15):
[3] Speech recognition by humans and machines under conditions with severe channel variability and noise
Lippmann, RP
Carlson, BA
APPLICATIONS AND SCIENCE OF ARTIFICIAL NEURAL NETWORKS III, 1997, 3077 : 46 - 57
[4] Speech recognition in adverse conditions: A review
Mattys, Sven L.
Davis, Matthew H.
Bradlow, Ann R.
Scott, Sophie K.
LANGUAGE AND COGNITIVE PROCESSES, 2012, 27 (7-8): : 953 - 978
[5] English Conversational Telephone Speech Recognition by Humans and Machines
Saon, George
Kurata, Gakuto
Sercu, Tom
Audhkhasi, Kartik
Thomas, Samuel
Dimitriadis, Dimitrios
Cui, Xiaodong
Ramabhadran, Bhuvana
Picheny, Michael
Lim, Lynn-Li
Roomi, Bergul
Hall, Phil
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 132 - 136
[6] ENGLISH BROADCAST NEWS SPEECH RECOGNITION BY HUMANS AND MACHINES
Thomas, Samuel
Suzuki, Masayuki
Huang, Yinghui
Kurata, Gakuto
Tuske, Zoltan
Saon, George
Kingsbury, Brian
Picheny, Michael
Dibert, Tom
Kaiser-Schatzlein, Alice
Samko, Bern
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6455 - 6459
[7] Assessing costa rican children speech recognition by humans and machines
Morales-Rodriguez, Maribel
Coto-Jimenez, Marvin
TECNOLOGIA EN MARCHA, 2022, 35
[8] SYNTHESIS AND RECOGNITION OF SPEECH - VOICE COMMUNICATION BETWEEN HUMANS AND MACHINES
FLANAGAN, JL
IEEE TRANSACTIONS ON SONICS AND ULTRASONICS, 1982, 29 (03): : 158 - 158
[9] Towards improving speech detection robustness for speech recognition in adverse conditions
Karray, L
Martin, A
SPEECH COMMUNICATION, 2003, 40 (03) : 261 - 276
[10] Listening in the dips: Comparing relevant features for speech recognition in humans and machines
Spille, Constantin
Meyer, Bernd T.
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2968 - 2972

← 1 2 3 4 5 →