Predicting Speech Intelligibility of Enhanced Speech Using Phone Accuracy of DNN-based ASR System

被引:10
|
作者
Arai, Kenichi [1 ]
Araki, Shoko [1 ]
Ogawa, Atsunori [1 ]
Kinoshita, Keisuke [1 ]
Nakatani, Tomohiro [1 ]
Yamamoto, Katsuhiko [2 ]
Irino, Toshio [2 ]
机构
[1] NTT Commun Sci Labs, Kyoto, Japan
[2] Wakayama Univ, Grad Sch Syst Engn, Wakayama, Japan
来源
关键词
speech intelligibility prediction; speech enhancement; automatic speech recognition; deep neural networks; phone accuracy; phone bi-gram; SINGLE-ENDED PREDICTION; LISTENING EFFORT; MODEL; RECOGNITION; INDEX;
D O I
10.21437/Interspeech.2019-1381
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
The ability of state-of-the-art automatic speech recognition (ASR) systems, which use deep neural networks (DNN), has recently been approaching that of human auditory systems. On the other hand, although measuring the intelligibility of enhanced speech signals is important for developing auditory algorithms and devices, the current measurement methods mainly rely on subjective experiments. Therefore, it would be preferable to employ an ASR system to predict the subjective speech intelligibility (SI) of enhanced speech. In this study, we evaluate the SI prediction performance of DNN-based ASR systems using phone accuracies. We found that an ASR system with multi-condition training achieves the best SI prediction accuracy for enhanced speech when compared with conventional methods (STOI, HASPI) and a recently proposed technique (GEDI). In addition, since our ASR system uses only a phone language model, it offers the advantage of being able to predict intelligibility independently of prior knowledge of words.
引用
收藏
页码:4275 / 4279
页数:5
相关论文
共 50 条
  • [1] Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-based ASR System
    Arai, Kenichi
    Araki, Shoko
    Ogawa, Atsunori
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Irino, Toshio
    [J]. INTERSPEECH 2020, 2020, : 1156 - 1160
  • [2] Prediction of speech intelligibility with DNN-based performance measures
    Martinez, Angel Mario Castro
    Spille, Constantin
    Rossbach, Jana
    Kollmeier, Birger
    Meyer, Bernd T.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 74
  • [3] DNN-BASED SPEECH ENHANCEMENT USING MBE MODEL
    Huang, Qizheng
    Bao, Changchun
    Wang, Xianyun
    Xiang, Yang
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 196 - 200
  • [4] DNN-Based Speech Synthesis Using Speaker Codes
    Hojo, Nobukatsu
    Ijima, Yusuke
    Mizuno, Hideyuki
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 462 - 472
  • [5] DNN-Based Arabic Speech Synthesis
    Amrouche, Aissa
    Bentrcia, Youssouf
    Boubakeur, Khadidja Nesrine
    Abed, Ahcene
    [J]. 2022 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2022), 2022, : 378 - 382
  • [6] DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus
    Yamashita, Yuki
    Koriyama, Tomoki
    Saito, Yuki
    Takamichi, Shinnosuke
    Ijima, Yusuke
    Masumura, Ryo
    Saruwatari, Hiroshi
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6438 - 6443
  • [7] Intelligibility prediction of enhanced speech using recognition accuracy of end-to-end ASR systems
    Arai, Kenichi
    Ogawa, Atsunori
    Araki, Shoko
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Kamo, Naoyuki
    Irino, Toshio
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1583 - 1589
  • [8] Robust DNN-based VAD augmented with phone entropy based rejection of background speech
    Fujita, Yuya
    Iso, Ken-ichi
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3663 - 3667
  • [9] DNN-BASED SPEECH RECOGNITION FOR GLOBALPHONE LANGUAGES
    Tachbelie, Martha Yifiru
    Abulimiti, Ayimunishagu
    Abate, Solomon Teferra
    Schultz, Tanja
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8269 - 8273
  • [10] DNN-BASED ENHANCEMENT OF NOISY AND REVERBERANT SPEECH
    Zhao, Yan
    Wang, DeLiang
    Merks, Ivo
    Zhang, Tao
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6525 - 6529