Speech recognition by machines and humans

被引:285
|
作者
Lippmann, RP
机构
[1] Lincoln Laboratory MIT, Lexington, MA 02173-9108
关键词
speech recognition; speech perception; speech; perception; automatic speech recognition; machine recognition; performance; noise; nonsense syllables; nonsense sentences;
D O I
10.1016/S0167-6393(97)00021-6
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper reviews past work comparing modern speech recognition systems and humans to determine how far recent dramatic advances in technology have progressed towards the goal of human-like performance. Comparisons use six modem speech corpora with vocabularies ranging from 10 to more than 65,000 words and content ranging from read isolated words to spontaneous conversations. Error rates of machines are often more than an order of magnitude greater than those of humans for quiet, wideband, read speech. Machine performance degrades further below that of humans in noise, with channel variability, and for spontaneous speech. Humans can also recognize quiet, clearly spoken nonsense syllables and nonsense sentences with little high-level grammatical information. These comparisons suggest that the human-machine performance gap can be reduced by basic research on improving low-level acoustic-phonetic modeling, on improving robustness with noise and channel variability, and on more accurately modeling spontaneous speech. (C) 1997 Elsevier Science B.V.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 50 条
  • [21] An Application of Speech Recognition with Support Vector Machines
    Eray, Osman
    Tokat, Sezai
    Iplikci, Serdar
    [J]. 2018 6TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSIC AND SECURITY (ISDFS), 2018, : 38 - 43
  • [22] Applications of support vector machines to speech recognition
    Ganapathiraju, A
    Hamaker, JE
    Picone, J
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (08) : 2348 - 2355
  • [23] Infinite Support Vector Machines in Speech Recognition
    Yang, Jingzhou
    van Dalen, Rogier C.
    Gales, Mark
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3302 - 3306
  • [24] Convolutional support vector machines for speech recognition
    Vishal Passricha
    Rajesh Kumar Aggarwal
    [J]. International Journal of Speech Technology, 2019, 22 : 601 - 609
  • [25] PATTERN-RECOGNITION BY HUMANS AND MACHINES, VOL 1, SPEECH-PERCEPTION - SCHWAB,EC, NUSBAUM,HC
    SEGUI, J
    [J]. ANNEE PSYCHOLOGIQUE, 1988, 88 (02): : 294 - 295
  • [26] PATTERN-RECOGNITION BY HUMANS AND MACHINES, VOL 1, SPEECH-PERCEPTION - SCHWAB,EC, NUSBAUM,HC
    WATERWORTH, JA
    [J]. CURRENT PSYCHOLOGY-RESEARCH & REVIEWS, 1988, 7 (03): : 272 - 273
  • [27] PATTERN-RECOGNITION BY HUMANS AND MACHINES, VOL 1, SPEECH-PERCEPTION - SCHWAB,EC, NUSBAUM,HC
    JUSCZYK, PW
    [J]. CONTEMPORARY PSYCHOLOGY, 1988, 33 (04): : 321 - 322
  • [28] Medical Speech Recognition: Reaching Parity with Humans
    Edwards, Erik
    Salloum, Wael
    Finley, Greg P.
    Fone, James
    Cardiff, Greg
    Miller, Mark
    Suendermann-Oeft, David
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 512 - 524
  • [29] Tracking Without Re-recognition in Humans and Machines
    Linsley, Drew
    Malik, Girik
    Kim, Junkyung
    Govindarajan, Lakshmi N.
    Mingolla, Ennio
    Serre, Thomas
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [30] Perception and classification of emotions in nonsense speech: Humans versus machines
    Parada-Cabaleiro, Emilia
    Batliner, Anton
    Schmitt, Maximilian
    Schedl, Markus
    Costantini, Giovanni
    Schuller, Bjoern
    [J]. PLOS ONE, 2023, 18 (01):