Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features

被引:28
|
作者
Schubotz, Wiebke
Brand, Thomas
Kollmeier, Birger
Ewert, Stephan D. [1 ]
机构
[1] Carl von Ossietzky Univ Oldenburg, Med Phys, D-26111 Oldenburg, Germany
来源
关键词
COMODULATION MASKING RELEASE; HEARING-IMPAIRED LISTENERS; INFORMATIONAL MASKING; FLUCTUATING NOISE; FREQUENCY-SELECTIVITY; SIMULTANEOUS TALKERS; RECEPTION THRESHOLD; PERCEPTION; INDEX; SEPARATION;
D O I
10.1121/1.4955079
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech intelligibility is strongly affected by the presence of maskers. Depending on the spectro-temporal structure of the masker and its similarity to the target speech, different masking aspects can occur which are typically referred to as energetic, amplitude modulation, and informational masking. In this study speech intelligibility and speech detection was measured in maskers that vary systematically in the time-frequency domain from steady-state noise to a single interfering talker. Male and female target speech was used in combination with maskers based on speech for the same or different gender. Observed data were compared to predictions of the speech intelligibility index, extended speech intelligibility index, multi-resolution speech-based envelope-power-spectrum model, and the short-time objective intelligibility measure. The different models served as analysis tool to help distinguish between the different masking aspects. Comparison shows that overall masking can to a large extent be explained by short-term energetic masking. However, the other masking aspects (amplitude modulation an informational masking) influence speech intelligibility as well. Additionally, it was obvious that all models showed considerable deviations from the data. Therefore, the current study provides a benchmark for further evaluation of speech prediction models. (C) 2016 Acoustical Society of America.
引用
收藏
页码:524 / 540
页数:17
相关论文
共 50 条
  • [21] One-class network leveraging spectro-temporal features for generalized synthetic speech detection
    Yea, Jiahong
    Yan, Diqun
    Fu, Songyin
    Mac, Bin
    Xia, Zhihua
    SPEECH COMMUNICATION, 2025, 169
  • [22] Hilbert Envelope Based Spectro-Temporal Features for Phoneme Recognition in Telephone Speech
    Thomas, Samuel
    Ganapathy, Sriram
    Hermansky, Hynek
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1521 - +
  • [23] Speaker sex effects on temporal and spectro-temporal measures of speech
    Herrmann, Frank
    Cunningham, Stuart P.
    Whiteside, Sandra P.
    JOURNAL OF THE INTERNATIONAL PHONETIC ASSOCIATION, 2014, 44 (01) : 59 - 74
  • [24] Informative Spectro-Temporal Bottleneck Features for Noise-Robust Speech Recognition
    Chang, Shuo-Yiin
    Morgan, Nelson
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 99 - 103
  • [25] Data-Driven and Feedback Based Spectro-Temporal Features for Speech Recognition
    Sivaram, G. S. V. S.
    Nemala, Sridhar Krishna
    Mesgarani, Nima
    Hermansky, Hynek
    IEEE SIGNAL PROCESSING LETTERS, 2010, 17 (11) : 957 - 960
  • [26] Robust Spectro-Temporal Speech Features with Model-Based Distribution Equalization
    Ngouoko, Samuel K. M.
    Heckmann, Martin
    Wrede, Britta
    2013 14TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES (WIAMIS), 2013,
  • [27] Spectro-Temporal Modulations for Robust Speech Emotion Recognition
    Yeh, Lan-Ying
    Chi, Tai-Shih
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 789 - 792
  • [28] SPECTRO-TEMPORAL SUBBAND WIENER FILTER FOR SPEECH ENHANCEMENT
    Hsu, Chung-Chien
    Lin, Tse-En
    Chen, Jian-Hueng
    Chi, Tai-Shih
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4001 - 4004
  • [29] Spectro-temporal weighting of interaural time differences in speech
    Baltzell, Lucas S.
    Cho, Adrian Y.
    Swaminathan, Jayaganesh
    Best, Virginia
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 147 (06): : 3883 - 3894
  • [30] Speech discrimination based on multiscale spectro-temporal modulations
    Mesgarani, N
    Shamma, S
    Slaney, M
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 601 - 604