Speech recognition by machines and humans

被引:285
|
作者
Lippmann, RP
机构
[1] Lincoln Laboratory MIT, Lexington, MA 02173-9108
关键词
speech recognition; speech perception; speech; perception; automatic speech recognition; machine recognition; performance; noise; nonsense syllables; nonsense sentences;
D O I
10.1016/S0167-6393(97)00021-6
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper reviews past work comparing modern speech recognition systems and humans to determine how far recent dramatic advances in technology have progressed towards the goal of human-like performance. Comparisons use six modem speech corpora with vocabularies ranging from 10 to more than 65,000 words and content ranging from read isolated words to spontaneous conversations. Error rates of machines are often more than an order of magnitude greater than those of humans for quiet, wideband, read speech. Machine performance degrades further below that of humans in noise, with channel variability, and for spontaneous speech. Humans can also recognize quiet, clearly spoken nonsense syllables and nonsense sentences with little high-level grammatical information. These comparisons suggest that the human-machine performance gap can be reduced by basic research on improving low-level acoustic-phonetic modeling, on improving robustness with noise and channel variability, and on more accurately modeling spontaneous speech. (C) 1997 Elsevier Science B.V.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 50 条
  • [1] English Conversational Telephone Speech Recognition by Humans and Machines
    Saon, George
    Kurata, Gakuto
    Sercu, Tom
    Audhkhasi, Kartik
    Thomas, Samuel
    Dimitriadis, Dimitrios
    Cui, Xiaodong
    Ramabhadran, Bhuvana
    Picheny, Michael
    Lim, Lynn-Li
    Roomi, Bergul
    Hall, Phil
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 132 - 136
  • [2] ENGLISH BROADCAST NEWS SPEECH RECOGNITION BY HUMANS AND MACHINES
    Thomas, Samuel
    Suzuki, Masayuki
    Huang, Yinghui
    Kurata, Gakuto
    Tuske, Zoltan
    Saon, George
    Kingsbury, Brian
    Picheny, Michael
    Dibert, Tom
    Kaiser-Schatzlein, Alice
    Samko, Bern
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6455 - 6459
  • [3] Assessing costa rican children speech recognition by humans and machines
    Morales-Rodriguez, Maribel
    Coto-Jimenez, Marvin
    [J]. TECNOLOGIA EN MARCHA, 2022, 35
  • [4] SYNTHESIS AND RECOGNITION OF SPEECH - VOICE COMMUNICATION BETWEEN HUMANS AND MACHINES
    FLANAGAN, JL
    [J]. IEEE TRANSACTIONS ON SONICS AND ULTRASONICS, 1982, 29 (03): : 158 - 158
  • [5] Listening in the dips: Comparing relevant features for speech recognition in humans and machines
    Spille, Constantin
    Meyer, Bernd T.
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2968 - 2972
  • [6] NOISE ROBUST SPEECH RECOGNITION ON AURORA4 BY HUMANS AND MACHINES
    Qian, Yanmin
    Tan, Tian
    Hu, Hu
    Liu, Qi
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5604 - 5608
  • [7] Speech separation in humans and machines
    Ellis, D
    [J]. 2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 1 - 1
  • [8] Speech recognition by humans and machines under conditions with severe channel variability and noise
    Lippmann, RP
    Carlson, BA
    [J]. APPLICATIONS AND SCIENCE OF ARTIFICIAL NEURAL NETWORKS III, 1997, 3077 : 46 - 57
  • [9] Humans, machines, and conversations: An ethnographic study of the making of automatic speech recognition technologies
    Voskuhl, A
    [J]. SOCIAL STUDIES OF SCIENCE, 2004, 34 (03) : 393 - 421
  • [10] Speaker Recognition by Machines and Humans
    Hansen, John H. L.
    Hasan, Taufiq
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (06) : 74 - 99