Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features

被引：28

作者：

Schubotz, Wiebke

Brand, Thomas

Kollmeier, Birger

Ewert, Stephan D. ^{[1
]}

机构：

[1] Carl von Ossietzky Univ Oldenburg, Med Phys, D-26111 Oldenburg, Germany

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2016年 / 140卷 / 01期

关键词：

COMODULATION MASKING RELEASE; HEARING-IMPAIRED LISTENERS; INFORMATIONAL MASKING; FLUCTUATING NOISE; FREQUENCY-SELECTIVITY; SIMULTANEOUS TALKERS; RECEPTION THRESHOLD; PERCEPTION; INDEX; SEPARATION;

D O I：

10.1121/1.4955079

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech intelligibility is strongly affected by the presence of maskers. Depending on the spectro-temporal structure of the masker and its similarity to the target speech, different masking aspects can occur which are typically referred to as energetic, amplitude modulation, and informational masking. In this study speech intelligibility and speech detection was measured in maskers that vary systematically in the time-frequency domain from steady-state noise to a single interfering talker. Male and female target speech was used in combination with maskers based on speech for the same or different gender. Observed data were compared to predictions of the speech intelligibility index, extended speech intelligibility index, multi-resolution speech-based envelope-power-spectrum model, and the short-time objective intelligibility measure. The different models served as analysis tool to help distinguish between the different masking aspects. Comparison shows that overall masking can to a large extent be explained by short-term energetic masking. However, the other masking aspects (amplitude modulation an informational masking) influence speech intelligibility as well. Additionally, it was obvious that all models showed considerable deviations from the data. Therefore, the current study provides a benchmark for further evaluation of speech prediction models. (C) 2016 Acoustical Society of America.

引用

页码：524 / 540

页数：17

共 50 条

[31] Spectro-temporal glimpsing of speech in noise: Regularity and coherence of masking patterns reduces uncertainty and increases intelligibility
Fogerty, Daniel
Sevich, Victoria A.
Healy, Eric W.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 148 (03): : 1552 - 1566
[32] Analysis of Spectro-Temporal Modulation Representation for Deep-Fake Speech Detection
Cheng, Haowei
Mawalim, Candy Olivia
Li, Kai
Wang, Lijun
Unoki, Masashi
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1822 - 1829
[33] Learning spectro-temporal features with 3D CNNs for speech emotion recognition
Kim, Jaebok
Truong, Khiet P.
Englebienne, Gwenn
Evers, Vanessa
2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 383 - 388
[34] Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition
Meyer, Bernd T.
Kollmeier, Birger
SPEECH COMMUNICATION, 2011, 53 (05) : 753 - 767
[35] Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
Geng, Mengzhe
Xie, Xurong
Ye, Zi
Wang, Tianzi
Li, Guinan
Hu, Shujie
Liu, Xunying
Meng, Helen
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2597 - 2611
[36] AUTOMATIC RECOGNITION OF SPEECH EMOTION USING LONG-TERM SPECTRO-TEMPORAL FEATURES
Wu, Siqing
Falk, Tiago H.
Chan, Wai-Yip
2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 205 - 210
[37] Spectro-Temporal Features for Howling Frequency Detection
Lee, Jae-Won
Choi, Seung Ho
COMPUTER APPLICATIONS FOR WEB, HUMAN COMPUTER INTERACTION, SIGNAL AND IMAGE PROCESSING AND PATTERN RECOGNITION, 2012, 342 : 25 - +
[38] SPECTRO-TEMPORAL ANALYSIS OF SPEECH AFFECTED BY DEPRESSION AND PSYCHOMOTOR RETARDATION
Cummins, Nicholas
Epps, Julien
Ambikairajah, Eliathamby
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7542 - 7546
[39] Spectro-temporal processing of speech - An information-theoretic framework
Christiansen, Thomas U.
Dau, Torsten
Greenberg, Steven
HEARING - FROM SENSORY PROCESSING TO PERCEPTION, 2007, : 517 - 523
[40] DERIVING SPECTRO-TEMPORAL PROPERTIES OF HEARING FROM SPEECH DATA
Ondel, Lucas
Li, Ruizhi
Sell, Gregory
Hermansky, Hynek
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 411 - 415

← 1 2 3 4 5 →