DNN-based automatic speech recognition as a model for human phoneme perception

被引:8
|
作者
Exter, Mats [1 ,2 ]
Meyer, Bernd T. [3 ]
机构
[1] Carl von Ossietzky Univ Oldenburg, Med Phys, Oldenburg, Germany
[2] Carl von Ossietzky Univ Oldenburg, Cluster Excellence Hearing4all, Oldenburg, Germany
[3] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA
关键词
speech recognition; phoneme perception; models of speech intelligibility; HEARING; NOISE;
D O I
10.21437/Interspeech.2016-1285
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we test the applicability of state-of-the-art automatic speech recognition (ASR) to predict phoneme confusions in human listeners. Phoneme-specific response rates are obtained from ASR based on deep neural networks (DNNs) and from listening tests with six normal-hearing subjects. The measure for model quality is the correlation of phoneme recognition accuracies obtained in ASR and in human speech recognition (HSR). Various feature representations are used as input to the DNNs to explore their relation to overall ASR performance and model prediction power. Standard filterbank output and perceptual linear prediction (PLP) features result in best predictions, with correlation coefficients reaching r = 0.9.
引用
收藏
页码:615 / 619
页数:5
相关论文
共 50 条
  • [1] DNN-BASED SPEECH RECOGNITION FOR GLOBALPHONE LANGUAGES
    Tachbelie, Martha Yifiru
    Abulimiti, Ayimunishagu
    Abate, Solomon Teferra
    Schultz, Tanja
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8269 - 8273
  • [2] TURBO FUSION OF MAGNITUDE AND PHASE INFORMATION FOR DNN-BASED PHONEME RECOGNITION
    Lohrenz, Timo
    Fingscheidt, Tim
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 118 - 125
  • [3] Phoneme Confusions in Human and Automatic Speech Recognition
    Meyer, Bernd T.
    Waechter, Matthias
    Brand, Thomas
    Kollmeier, Birger
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2740 - 2743
  • [4] Investigation of DNN-Based Audio-Visual Speech Recognition
    Tamura, Satoshi
    Ninomiya, Hiroshi
    Kitaoka, Norihide
    Osuga, Shin
    Iribe, Yurie
    Takeda, Kazuya
    Hayamizu, Satoru
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2444 - 2451
  • [5] DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis
    Ikbel Hadj Ali
    Zied Mnasri
    Zied Lachiri
    [J]. International Journal of Speech Technology, 2020, 23 : 569 - 584
  • [6] Semi-Supervised Training of DNN-Based Acoustic Model for ATC Speech Recognition
    Smidl, Lubos
    Svec, Jan
    Prazak, Ales
    Trmal, Jan
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 646 - 655
  • [7] DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis
    Ali, Ikbel Hadj
    Mnasri, Zied
    Lachiri, Zied
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 569 - 584
  • [8] INTEGRATED DNN-BASED MODEL ADAPTATION TECHNIQUE FOR NOISE-ROBUST SPEECH RECOGNITION
    Lee, Kang Hyun
    Kang, Woo Hyun
    Kang, Tae Gyoon
    Kim, Nam Soo
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5245 - 5249
  • [9] DNN-BASED SPEECH ENHANCEMENT USING MBE MODEL
    Huang, Qizheng
    Bao, Changchun
    Wang, Xianyun
    Xiang, Yang
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 196 - 200
  • [10] Modeling Long Temporal Contexts for Robust DNN-based Speech Recognition
    Li, Bo
    Sim, Khe Chai
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 353 - 357