Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain

被引:42
|
作者
Relano-Iborra, Helia [1 ]
May, Tobias [1 ]
Zaar, Johannes [1 ]
Scheidiger, Christoph [1 ]
Dau, Torsten [1 ]
机构
[1] Tech Univ Denmark, Dept Elect Engn, Hearing Syst Grp, DK-2800 Lyngby, Denmark
来源
关键词
RECEPTION THRESHOLD; AMPLITUDE-MODULATION; TRANSMISSION INDEX; FREQUENCY; NOISE; MASKING; MODEL;
D O I
10.1121/1.4964505
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A speech intelligibility prediction model is proposed that combines the auditory processing front end of the multi-resolution speech-based envelope power spectrum model [ mr-sEPSM; Jorgensen, Ewert, and Dau ( 2013). J. Acoust. Soc. Am. 134( 1), 436-446] with a correlation back end inspired by the short-time objective intelligibility measure [ STOI; Taal, Hendriks, Heusdens, and Jensen ( 2011). IEEE Trans. Audio Speech Lang. Process. 19( 7), 2125-2136]. This "hybrid" model, named sEPSM(corr), is shown to account for the effects of stationary and fluctuating additive interferers as well as for the effects of non-linear distortions, such as spectral subtraction, phase jitter, and ideal time frequency segregation ( ITFS). The model shows a broader predictive range than both the original mr-sEPSM ( which fails in the phase-jitter and ITFS conditions) and STOI ( which fails to predict the influence of fluctuating interferers), albeit with lower accuracy than the source models in some individual conditions. Similar to other models that employ a short-term correlation-based back end, including STOI, the proposed model fails to account for the effects of room reverberation on speech intelligibility. Overall, the model might be valuable for evaluating the effects of a large range of interferers and distortions on speech intelligibility, including consequences of hearing impairment and hearing-instrument signal processing. (C) 2016 Author(s).
引用
收藏
页码:2670 / 2679
页数:10
相关论文
共 50 条
  • [1] Predicting binaural speech intelligibility using the signal-to-noise ratio in the envelope power spectrum domain
    Chabot-Leclerc, Alexandre
    MacDonald, Ewen N.
    Dau, Torsten
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 140 (01): : 192 - 205
  • [2] Speech intelligibility prediction based on the envelope power spectrum model with the dynamic compressive gammachirp auditory filterbank
    Yamamoto, Katsuhiko
    Irino, Toshio
    Matsui, Toshie
    Araki, Shoko
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2885 - 2889
  • [3] A multi-resolution envelope-power based model for speech intelligibility
    Jorgensen, Soren
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (01): : 436 - 446
  • [4] A metric for predicting binaural speech intelligibility in stationary noise and competing speech maskers
    Tang, Yan
    Cooke, Martin
    Fazenda, Bruno M.
    Cox, Trevor J.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 140 (03): : 1858 - 1870
  • [5] GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech
    Yamamoto, Katsuhiko
    Irino, Toshio
    Araki, Shoko
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    SPEECH COMMUNICATION, 2020, 123 : 43 - 58
  • [6] Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing
    Jorgensen, Soren
    Dau, Torsten
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2011, 130 (03): : 1475 - 1487
  • [7] Speech intelligibility in environmental sound maskers and prediction based on envelope-power based models
    Manabe, Yuna
    Tamagawa, Katsuya
    Sato, Keiko
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 3466 - 3471
  • [8] Predicting the intelligibility of noise-corrupted speech non-intrusively by across-band envelope correlation
    Chen, Fei
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2016, 24 : 109 - 113
  • [9] AN INTELLIGIBILITY METRIC BASED ON A SIMPLE MODEL OF SPEECH COMMUNICATION
    Van Kuyk, Steven
    Kleijn, W. Bastiaan
    Hendriks, Richard C.
    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [10] Predicting Speech Intelligibility Using a Gammachirp Envelope Distortion Index Based on the Signal-to-Distortion Ratio
    Yamamoto, Katsuhiko
    Irino, Toshio
    Matsui, Toshie
    Araki, Shako
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2949 - 2953