Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features

被引:28
|
作者
Schubotz, Wiebke
Brand, Thomas
Kollmeier, Birger
Ewert, Stephan D. [1 ]
机构
[1] Carl von Ossietzky Univ Oldenburg, Med Phys, D-26111 Oldenburg, Germany
来源
关键词
COMODULATION MASKING RELEASE; HEARING-IMPAIRED LISTENERS; INFORMATIONAL MASKING; FLUCTUATING NOISE; FREQUENCY-SELECTIVITY; SIMULTANEOUS TALKERS; RECEPTION THRESHOLD; PERCEPTION; INDEX; SEPARATION;
D O I
10.1121/1.4955079
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech intelligibility is strongly affected by the presence of maskers. Depending on the spectro-temporal structure of the masker and its similarity to the target speech, different masking aspects can occur which are typically referred to as energetic, amplitude modulation, and informational masking. In this study speech intelligibility and speech detection was measured in maskers that vary systematically in the time-frequency domain from steady-state noise to a single interfering talker. Male and female target speech was used in combination with maskers based on speech for the same or different gender. Observed data were compared to predictions of the speech intelligibility index, extended speech intelligibility index, multi-resolution speech-based envelope-power-spectrum model, and the short-time objective intelligibility measure. The different models served as analysis tool to help distinguish between the different masking aspects. Comparison shows that overall masking can to a large extent be explained by short-term energetic masking. However, the other masking aspects (amplitude modulation an informational masking) influence speech intelligibility as well. Additionally, it was obvious that all models showed considerable deviations from the data. Therefore, the current study provides a benchmark for further evaluation of speech prediction models. (C) 2016 Acoustical Society of America.
引用
收藏
页码:524 / 540
页数:17
相关论文
共 50 条
  • [41] The impact of exploiting spectro-temporal context in computational speech segregation
    Bentsen, Thomas
    Kressner, Abigail A.
    Dau, Torsten
    May, Tobias
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (01): : 248 - 259
  • [42] Methods for capturing spectro-temporal modulations in automatic speech recognition
    Kleinschmidt, M
    ACTA ACUSTICA UNITED WITH ACUSTICA, 2002, 88 (03) : 416 - 422
  • [43] Bioinspired sparse spectro-temporal representation of speech for robust classification
    Martinez, C.
    Goddard, J.
    Milone, D.
    Rufiner, H.
    COMPUTER SPEECH AND LANGUAGE, 2012, 26 (05): : 336 - 348
  • [44] Comparing the influence of spectro-temporal integration in computational speech segregation
    Bentsen, Thomas
    May, Tobias
    Kressner, Abigail A.
    Dau, Torsten
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3324 - 3328
  • [45] DeepCNN: Spectro-temporal feature representation for speech emotion recognition
    Saleem, Nasir
    Gao, Jiechao
    Irfan, Rizwana
    Almadhor, Ahmad
    Rauf, Hafiz Tayyab
    Zhang, Yudong
    Kadry, Seifedine
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2023, 8 (02) : 401 - 417
  • [46] DeepComboSAD: Spectro-Temporal Correlation Based Speech Activity Detection for Naturalistic Audio Streams
    Joglekar, Aditya
    Hansen, John H. L.
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1472 - 1476
  • [47] NON-INTRUSIVE QUALITY ASSESSMENT FOR ENHANCED SPEECH SIGNALS BASED ON SPECTRO-TEMPORAL FEATURES
    Li, Qiaohong
    Fang, Yuming
    Lin, Weisi
    Thalmann, Daniel
    2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2014,
  • [48] Nonlinear spectro-temporal features based on a cochlear model for automatic speech recognition in a noisy situation
    Choi, Yong-Sun
    Lee, Soo-Young
    NEURAL NETWORKS, 2013, 45 : 62 - 69
  • [49] Joint Optimization of Spectro-Temporal Features and Deep Neural Nets for Robust Automatic Speech Recognition
    Kovacs, Gyorgy
    Toth, Laszlo
    ACTA CYBERNETICA, 2015, 22 (01): : 117 - 134
  • [50] Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
    Esfandian, N.
    INTERNATIONAL JOURNAL OF ENGINEERING, 2020, 33 (01): : 105 - 111