Prediction of speech intelligibility with DNN-based performance measures

被引:11
|
作者
Martinez, Angel Mario Castro [1 ]
Spille, Constantin [1 ]
Rossbach, Jana [2 ]
Kollmeier, Birger [1 ]
Meyer, Bernd T. [2 ]
机构
[1] Carl von Ossietzky Univ Oldenburg, Med Phys & Cluster Excellence Hearing4all, Ammerlander Heerstr 114-118, D-26129 Oldenburg, Lower Saxony, Germany
[2] Carl von Ossietzky Univ Oldenburg, Commun Acoust & Cluster Excellence Hearing4all, Ammerlander Heerstr 114-118, D-26129 Oldenburg, Lower Saxony, Germany
来源
关键词
Speech intelligibility; Perception modeling; Automatic speech recognition; Speech audiometry; DEEP NEURAL-NETWORKS; AMPLITUDE-MODULATION; NORMAL-HEARING; NOISE; THRESHOLD; MODEL;
D O I
10.1016/j.csl.2021.101329
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities. This model does not require the clean speech reference nor the word labels during testing as the ASR decoding step - which finds the most likely sequence of words given phoneme posterior probabilities - is omitted. The model is evaluated via the root-mean-squared error between the predicted and observed speech reception thresholds from eight normal-hearing listeners. The recognition task consists of identifying noisy words from a German matrix sentence test. The speech material was mixed with eight noise maskers covering different modulation types, from speech-shaped stationary noise to a single-talker masker. The prediction performance is compared to five established models and an ASR-model using word labels. Two combinations of features and networks were tested. Both include temporal information either at the feature level (amplitude modulation filterbanks and a feed-forward network) or captured by the architecture (mel-spectrograms and a time-delay deep neural network, TDNN). The TDNN model is on par with the DNN while reducing the number of parameters by a factor of 37; this optimization allows parallel streams on dedicated hearing aid hardware as a forward-pass can be computed within the 10 ms of each frame. The proposed model performs almost as well as the label-based model and produces more accurate predictions than the baseline models.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners
    Tu, Zehai
    Ma, Ning
    Barker, Jon
    [J]. INTERSPEECH 2022, 2022, : 3488 - 3492
  • [2] Predicting Speech Intelligibility of Enhanced Speech Using Phone Accuracy of DNN-based ASR System
    Arai, Kenichi
    Araki, Shoko
    Ogawa, Atsunori
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Yamamoto, Katsuhiko
    Irino, Toshio
    [J]. INTERSPEECH 2019, 2019, : 4275 - 4279
  • [3] DNN-Based Linear Prediction Residual Enhancement for Speech Dereverberation
    Feng, Xinyang
    Li, Nuo
    He, Zunwen
    Zhang, Yan
    Zhang, Wancheng
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 541 - 545
  • [4] DNN-Based Arabic Speech Synthesis
    Amrouche, Aissa
    Bentrcia, Youssouf
    Boubakeur, Khadidja Nesrine
    Abed, Ahcene
    [J]. 2022 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2022), 2022, : 378 - 382
  • [5] DNN-based Voice Conversion with Auxiliary Phonemic Information to Improve Intelligibility of Glossectomy Patients' Speech
    Murakami, Hiroki
    Hara, Sunao
    Abe, Masanobu
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 138 - 142
  • [6] Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-based ASR System
    Arai, Kenichi
    Araki, Shoko
    Ogawa, Atsunori
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Irino, Toshio
    [J]. INTERSPEECH 2020, 2020, : 1156 - 1160
  • [7] DNN-BASED ENHANCEMENT OF NOISY AND REVERBERANT SPEECH
    Zhao, Yan
    Wang, DeLiang
    Merks, Ivo
    Zhang, Tao
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6525 - 6529
  • [8] DNN-BASED SPEECH RECOGNITION FOR GLOBALPHONE LANGUAGES
    Tachbelie, Martha Yifiru
    Abulimiti, Ayimunishagu
    Abate, Solomon Teferra
    Schultz, Tanja
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8269 - 8273
  • [9] DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters
    Martinez, Angel Mario Castro
    Gerlach, Lukas
    Paya-Vaya, Guillermo
    Hermansky, Hynek
    Ooster, Jasper
    Meyer, Bernd T.
    [J]. SPEECH COMMUNICATION, 2019, 106 : 44 - 56
  • [10] A DNN-based emotional speech synthesis by speaker adaptation
    Yang, Hongwu
    Zhang, Weizhao
    Zhi, Pengpeng
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 633 - 637